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ABSTRACT 


This  dissertation  explores  the  topic  of  detecting  and  localizing  change  in  a  series  of 
multivariate  data  using  graph-theoretic  statistical  criteria.  Change-detection  methods 
based  on  graph  theory  are  emerging  due  to  their  ability  to  detect  change  of  a  general 
nature  with  desirable  power  properties.  The  graph-theoretic  structures  of  minimum  non- 
bipartite  matching  and  nearest  neighbors  according  to  distances  between  observations 
form  the  basis  of  our  statistical  procedures.  We  consider  the  computation  time  to 
implement  the  procedures  with  the  detection  power  of  the  derived  statistics.  In  a 
simulation  study,  we  evaluate  the  power  of  our  proposed  statistical  tests  in  a  series  of 
vignettes  in  which  the  sampling  distribution,  dimensionality,  change  parameter  (location 
or  scale),  change  type  (abrupt  or  gradual),  and  change  magnitude  each  are  allowed  to 
vary.  We  compare  detection  power  with  contemporary  parametric  and  graph-theoretic 
approaches.  Although  our  tests  alone  do  not  provide  the  information  needed  to  localize  a 
change  point,  we  develop  a  follow-on  procedure  that  satisfies  this  objective.  We  illustrate 
our  proposed  statistical  tests  and  change-point  localization  techniques  in  an  application, 
which  demonstrates  how  several  of  the  apparent  limitations  of  our  approach  can  be 
surmounted. 
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EXECUTIVE  SUMMARY 


This  dissertation  explores  the  topic  of  detecting  and  localizing  change  in  a  series  of 
multivariate  data  using  graph-theoretic  statistical  criteria.  A  cluster  of  fire  detection 
sensors  produce  such  a  data  series;  detecting  change  may  mean  the  onset  of  a  fire  or  the 
degradation  of  a  sensor.  This  topic  is  of  importance  to  many  other  fields  of  practice, 
including  quality  assurance,  surveillance,  remote  sensing,  and  public  health.  Considering 
that  distributional  assumptions  become  more  difficult  to  verify  as  the  dimensionality  of 
observations  increases,  nonparametric  methods  are  gaining  traction  in  the  study  of 
multivariate  change  detection.  As  a  subset  of  the  latter,  methods  based  on  graph  theory 
are  emerging  due  to  their  ability  to  detect  change  of  a  general  nature  with  desirable  power 
properties. 

Graph-theoretic  statistics  are  derived  from  a  complete  graph  in  which 
observations  are  taken  as  vertices,  and  either  one  edge  (undirected)  or  two  edges 
(directed)  connect  each  pair  of  vertices.  An  edge  weight  is  assigned,  consisting  of  the 
dissimilarity  or  distance  between  the  two  adjacent  vertices.  A  subgraph  class,  such  as  one 
consisting  of  spanning  trees  or  matchings,  imposes  a  particular  structure  on  its  vertices 
and  edges.  A  minimum  subgraph  is  a  member  of  a  subgraph  class  that  minimizes  the  sum 
of  edge  weights.  For  example,  a  minimum  spanning  tree  is  the  least-weight  acyclic 
subgraph  that  contains  a  path  from  each  vertex  to  every  other  vertex,  and  a  minimum 
non-bipartite  matching  is  a  subgraph  that  pairs  all  observations  so  that  the  total  weight  of 
the  edges  is  minimized.  Recent  statistical  literature  (Chen  &  Zhang,  2015;  Friedman  & 
Rafsky,  1979;  Rosenbaum,  2005;  Ruth,  2009;  Ruth  &  Koyak,  2011)  establishes  that 
statistics  derived  from  minimum  subgraphs  are  effective  at  extracting  information  about 
the  stability  of  the  sampling  distributions  from  which  a  sequence  of  multivariate 
observations  is  obtained. 

Our  examination  of  graph-theoretic  change  detection  techniques  is  motivated  by 
the  need  to  find  new  methods  that  are  both  computationally  efficient  and  powerful.  In  his 
Ph.D.  dissertation,  Ruth  (2009)  demonstrates  that  ensembles  of  edge-disjoint  non- 
bipartite  matchings,  which  are  large  collections  of  minimum  subgraphs,  allow  the 


xv 


development  of  statistical  tests  that  match  the  performance  of  the  best  parametric 
statistical  methods  when  the  assumptions  justifying  use  of  the  latter  are  true.  Finding 
ensembles,  however,  requires  computation  time  on  the  order  of  the  fourth  power  of  the 
sample  size,  which  greatly  restricts  their  use  on  large  samples.  We  develop  statistical 
procedures  for  two  less-computationally  intensive  subgraphs,  minimum  /c-f actors  and 
^-nearest  neighbors  (fcNN),  to  approximate  the  information  obtained  from  ensembles.  A 
^-factor  is  a  subgraph  in  which  each  vertex  is  incident  to  exactly  k  undirected  edges. 
Although  an  ensemble  of  minimum  non-bipartite  matchings  generates  a  sequence  of 
nested  factors,  our  approach  is  to  find  a  minimum  ^-factor  without  sequential  generation, 
which  reduces  computation  time.  For  a  &NN  subgraph  each  vertex  is  associated  with  k 
other  vertices  that  have  the  least  directed  edge  weights.  Efficient  algorithms  for  finding 
A'NN  subgraphs  are  widely  available  due  to  their  importance  in  a  wide  range  of 
applications.  For  the  same  number  of  edges  it  is  possible  to  obtain  a  A'NN  in  a  small 
fraction  of  the  computation  time  required  to  solve  for  an  ensemble.  Additionally, 
updating  a  A'NN  by  adding  or  removing  observations  is  relatively  straightforward,  an 
attribute  not  shared  by  ensembles  and  minimum  ^-factors. 

We  propose  a  class  of  statistical  tests  for  detecting  change  in  the  distribution  of  a 
sequence  of  independent  multivariate  observations  based  on  summing  scores  assigned  to 
the  edges  comprising  either  a  minimum  ^-factor  or  a  &NN  subgraph.  An  edge  score  is  a 
function  of  the  sequence  labels  assigned  to  the  two  incident  vertices.  Investigations 
dating  to  Wald  and  Wolfowitz  (1940)  have  noted  that  the  edges  of  minimum  subgraphs 
tend  to  associate  observations  from  the  same  distribution  when  the  sample  is  composed 
of  observations  from  two  different  distributions.  Ruth  (2009)  extends  this  property  in 
finding  that  the  vertex  labels  tend  to  be  closer  in  sequence  than  expected  when  a  change 
in  distribution  is  either  abrupt  or  gradual.  Our  proposed  tests  not  only  demonstrate  Ruth’s 
findings  on  two  new  classes  of  minimum  subgraphs  but  also  consider  the  best 
formulation  obtained  by  allowing  both  the  graph  complexity  k  and  the  edge  scoring  to 
vary. 

In  a  simulation  study,  we  evaluate  the  power  of  our  proposed  statistical  tests  in  a 
series  of  vignettes  in  which  the  sampling  distribution,  dimensionality,  change  parameter 


(location  or  scale),  change  type  (abrupt  or  gradual),  and  change  magnitude  each  are 
allowed  to  vary.  Although,  as  expected,  our  tests  cannot  match  the  performance  of 
ensemble -based  tests,  we  find  that  the  difference  is  small  taking  into  consideration  the 
increased  computational  flexibility  that  our  tests  afford. 

If  the  null  hypothesis  of  no  distributional  change  is  rejected  by  our  tests,  it  is  of 
interest  to  estimate  the  point  in  the  observation  sequence  at  which  such  change  occurred. 
Although  our  tests  alone  do  not  provide  the  information  needed  to  localize  a  change 
point,  we  develop  a  follow-on  procedure  that  satisfies  this  objective.  Our  procedure 
consists  of  fitting  a  series  of  broken-line  regressions  to  standardized  counts  of  edges  that 
associate  a  given  observation  to  those  occurring  earlier  in  the  sequence,  using 
hypergeometric  distributions  that  apply  if  the  null  hypothesis  of  no  distributional  change 
were  true.  The  break  point  of  the  broken-line  regression  that  minimizes  the  sum  of 
squared  errors  is  taken  as  the  change-point  estimator.  We  evaluate  the  performance  of 
this  estimator  in  a  series  of  simulation  vignettes  and  find  that  its  accuracy  reflects  the 
power  of  the  procedure  that  was  used  to  conduct  the  original  hypothesis  test. 

There  are  limitations  to  the  applicability  of  the  graph-theoretic  statistical 
procedures  that  we  propose.  First,  these  procedures  are  designed  for  independently 
sampled  observations,  an  assumption  that  frequently  is  violated  in  practice.  Second,  these 
procedures  are  powerful  in  detecting  any  type  of  distributional  change,  including  those 
for  which  detection  may  not  be  desired.  And  third,  these  procedures  are  not  designed  for 
real-time  implementation.  In  some  cases,  it  is  possible  to  surmount  these  limitations, 
which  we  demonstrate  with  an  application  of  our  procedures  to  the  fire-detection  data  of 
Gottuk  et  al.  (2002).  The  presence  of  a  set  of  control  data  allows  the  fitting  of  a  time- 
series  model,  based  only  on  current  and  past  observations,  that  removes  much  of  the 
temporal  dependence.  Low-level  stimuli  that  affect  the  sensors  but  do  not  justify  the 
setting  of  an  alarm  are  handled  by  filtering  with  a  scatterplot  smoother  that  is  truncated  at 
a  determined  threshold  value.  And,  although  we  process  the  data  off-line  in  batches,  we 
do  so  in  a  manner  that  near-real  time  detectability  may  be  possible. 
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I. 


INTRODUCTION 


Detecting  change  in  the  underlying  probability  distribution  of  a  sequence  of 
observations  is  an  important  research  area  in  statistics  and  other  disciplines.  The  nature  of 
change  may  be  an  abrupt  shift  or  a  gradual  drift  in  a  distributional  aspect  such  as  the 
mean,  variance,  or  a  higher  moment.  The  ability  to  detect  change  assists  in  a  myriad  of 
situations,  for  instance  diagnosing  machinery  health,  monitoring  a  manufacturing 
process,  examining  social  network  behavior,  considering  the  possibility  of  a  viral 
outbreak,  or  assessing  sensor  network  operation. 

JiJi,  Hammond,  Williams,  and  Rose-Pehrsson  (2003)  provide  an  example  of  the 
role  change  detection  plays  in  monitoring  a  collection  of  sensors.  The  authors  describe  an 
experiment  in  which  four  types  of  sensors — photoelectric  and  ionization  smoke  detectors, 
carbon  monoxide  and  carbon  dioxide  sensors — are  used  to  determine  the  point  at  which  a 
combustion  event  is  detectable.  The  sensors  in  this  experiment  are  distributed  on  a 
decommissioned  naval  vessel.  Twenty-four  events,  ranging  from  benign  heat-generating 
activities  to  smoldering  and  flaming  fires,  are  staged  in  various  shipboard  locations.  Of 
interest  is  the  time  that  it  takes  for  the  sensor  network  to  detect  a  change  that  signals  a 
combustion  event. 

If  an  analysis  detects  change,  it  is  natural  to  ask  where  in  the  sequence  the  change 
occurred.  Localizing  the  point  in  the  observational  sequence  at  which  the  change  began  is 
broadly  known  as  the  change-point  problem,  or  in  Eastern  literature,  the  disorder  problem 
(Brodsky  &  Darkhovsky,  1993).  Many  methods  for  change-point  localization  include  an 
initial  test  for  a  distributional  change  (see  Bhattacharya,  1994  and  Chen  &  Gupta,  2001). 

Change  detection  and  localization  techniques  may  be  categorized  as  either  static 
or  sequential.  Static  analysis  is  done  retrospectively;  sequential  analysis  is  done 
concurrent  with  observation.  Static  change-detection  applications  arise  out  of  forensic 
situations,  such  as  analyzing  vehicle  death  rates  before  and  after  passage  of  a  seat  belt 
law  (Bhattacharyya  &  Layton,  1979).  Sequential  change  detection  occurs  in  cases  where 
response  time  is  a  priority,  such  as  fire  detection.  Quality  control  specialists  use 
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multivariate  sequential  analysis  techniques  such  as  Shewart  charts  (Hotelling,  1947), 
cumulative  sum  (CUSUM)  schemes  (Woodall  &  Ncube,  1985;  Crosier,  1988)  and 
exponentially  weighted  smoothing  (Lowry,  Woodall,  Champ,  &  Rigdon,  1992)  to  signal 
degradation  in  quality.  Static  analysis  allows  more  computation  time  compared  to 
sequential  analysis.  Static  change  detection  is  also  described  in  literature  as  non¬ 
sequential,  offline,  retrospective,  or  posterior,  whereas  sequential  analysis  may  be 
referred  to  as  real-time,  dynamic,  or  streaming.  Sequential  testing  to  localize  change 
points  in  multivariate  streaming  data  is  a  growing  area  of  research,  notably  in  the 
machine  learning  community  (see  for  example,  Dries  &  Riickert,  2009). 

Change-detection  methods  are  also  characterized  as  either  parametric  or  non- 
parametric.  Parametric  approaches  to  change  detection  rely  upon  distributional 
assumptions,  typically  normality.  These  assumptions  are  difficult  to  verify  as  the  number 
of  variables  increases,  especially  if  the  number  of  variables  is  large.  Although  approaches 
based  on  multivariate  normality  are  commonly  used,  especially  in  quality  control, 
nonparametric  approaches  are  more  attractive  for  multivariate  investigations.  There  is  a 
need  for  nonparametric  approaches,  given  that  data  often  fail  to  meet  distributional 
assumptions. 

The  purpose  of  this  dissertation  is  to  propose  multivariate,  nonparametric  methods 
for  change  detection  and  change-point  localization  using  graph-theoretic  techniques.  We 
limit  our  investigation  to  static  change  analysis,  and  through  an  application  show  that  our 
methods  are  adaptable  for  near- sequential  analysis. 

Graph-theoretic  techniques  for  change  detection  have  their  origin  in  the  famous 
runs  test  of  Wald  and  Wolfowitz  (1940),  which  provides  a  univariate  approach  to  the 
two-sample  problem.  Friedman  and  Rafsky  (1979)  develop  a  multivariate  extension  of 
the  runs  test  using  minimum  spanning  trees.  Rosenbaum  (2005)  uses  a  test  based  on 
matching  techniques  to  detect  change  in  multivariate  data.  Matching  methods  are 
extended  by  Ruth  (2009)  and  Ruth  and  Koyak  (2011)  to  include  richer  information  from 
the  graph.  They  propose  a  graph-theoretic  test  for  multivariate  change  detection  with 
power  comparable  to  parametric  procedures  when  parametric  assumptions  are  met.  These 
methods  are  described  in  Chapter  II. 
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This  dissertation  is  organized  as  follows.  In  Chapter  II,  we  introduce  terminology 
and  review  the  literature  on  change  detection,  change-point  localization,  and  graph- 
theoretic  methods  in  the  multivariate  setting.  Chapter  III  introduces  statistical  tests  based 
on  three  graph-theoretic  statistics:  the  A- factor  sum  of  pair-differences,  the  A-nearcst- 
neighbors  sum  of  pair-differences,  and  the  nearest-neighbors  prior-edge  count.  We 
discuss  computational  considerations  for  extracting  information  from  graphs,  explore  the 
test  statistic  formation  and  characteristics,  and  consider  approaches  to  obtaining  a  p-value 
for  testing  purposes.  Through  a  simulation  study,  the  power  of  these  tests  to  detect 
change  is  estimated.  In  Chapter  IV,  we  present  and  evaluate  two  graph-theoretic  change- 
point  localization  techniques  for  an  observation  sequence  containing  an  underlying 
distributional  change.  In  Chapter  V,  we  apply  our  graph-theoretic  tests  and  change-point 
localization  techniques  to  the  fire  detection  sensor  data  considered  by  JiJi  et  al.  (2003). 
Consideration  is  given  to  adapting  time- series  data  to  tests  that  assume  data 
independence.  Chapter  VI  summarizes  our  findings  and  discusses  areas  for  further 
research  in  this  field. 
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II.  REVIEW  OF  GRAPH-THEORETIC  METHODS  FOR 

CHANGE  DETECTION 


In  this  chapter,  we  introduce  terminology  and  review  recent  advances  in  graph- 
theoretic  change  detection  and  change-point  localization  that  are  relevant  to  our 
investigation. 

A.  A  BIVARIATE  EXAMPLE 

We  use  the  two  data  sets  listed  in  Table  1  to  illustrate  change-detection  methods. 
The  first  data  set  (no-change  case)  consists  of  20  observations  y1,...,y;,...,y  sampled 

from  a  standard  bivariate  normal  distribution.  The  second  data  set  (change  case)  has  10 
observations  sampled  from  a  standard  bivariate  normal  distribution,  followed  by  10 
observations  sampled  from  a  bivariate  normal  distribution  with  identity  covariance  matrix 
and  mean  vector  (3,0). 


Table  1.  Bivariate  example  data.  The  no-change  case  is  to  the  left,  with  20 
observations  drawn  from  a  standard  bivariate  normal  distribution. 
The  change  case  is  to  the  right,  with  observations  1 1-20  drawn 
from  a  bivariate  normal  distribution  having  mean  vector  (3,0). 


Obs.  (0 

yn 

yn 

Obs.  (0 

yn 

yn 

1 

-0.435 

0.957 

1 

0.433 

0.523 

2 

-0.258 

-0.340 

2 

0.675 

-0.111 

3 

0.386 

0.034 

3 

-0.740 

-0.917 

4 

-0.954 

-0.224 

4 

-0.412 

-0.367 

5 

-1.678 

0.490 

5 

0.583 

0.065 

6 

-0.621 

1.168 

6 

-1.068 

-0.699 

7 

1.573 

0.971 

7 

0.196 

0.070 

8 

0.173 

-0.787 

8 

1.086 

0.309 

9 

0.881 

-0.527 

9 

1.690 

0.337 

10 

-0.591 

-0.802 

10 

-0.105 

0.126 

11 

1.007 

-0.045 

11 

2.884 

-0.384 

12 

-0.839 

-2.092 

12 

3.324 

-0.320 

13 

-0.315 

-1.238 

13 

2.342 

-0.168 

14 

-0.515 

-1.186 

14 

3.584 

0.448 

15 

-0.458 

0.104 

15 

3.794 

0.559 

16 

-0.488 

-1.359 

16 

1.920 

0.937 

17 

-1.610 

0.282 

17 

4.437 

1.363 

18 

-0.786 

0.488 

18 

2.948 

-0.445 

19 

-0.076 

-1.044 

19 

1.920 

-0.323 

20 

2.180 

0.008 

20 

3.165 

-2.158 
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To  use  the  fire  detection  example  we  explore  in  Chapter  V,  consider  the  observation  label 
(0  as  a  time  index  and  the  two  variables  (v,i  and  y&)  as  standardized  measurements  from 
an  ionization  smoke  detector  and  a  photoelectric  smoke  detector,  respectively.  The 
change  case  records  an  event  occurring  between  observations  10  and  11  that  only  triggers 
a  response  from  the  ionization  smoke  detector.  Figure  1  depicts  the  data  graphically. 
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Figure  1.  Bivariate  example  data,  labeled  by  observation  index.  The  no-change 
case  is  to  the  left:  20  observations  drawn  from  a  standard  bivariate 
normal  distribution.  Observations  1 1-20  are  colored  gray  to  facilitate 
later  comparisons.  The  change  case  is  to  the  right:  20  observations 
drawn  from  a  bivariate  normal  distribution  with  identity  covariance 
matrix  and  an  abrupt  change  in  the  mean  vector  from  (0,  0)  to  (3,  0) 
starting  at  observation  11.  Observations  1-10  are  shown  in  blue, 
observations  11-20  are  shown  in  yellow.  The  circles  marking 
observations  11  and  18  overlap. 


B.  PARAMETRIC  APPROACHES  TO  CHANGE  DETECTION 

Much  of  the  change  detection  and  localization  literature  concerns  parametric 
methods.  A  recent  bibliography  on  the  change-point  problem  (change  detection  and 
change-point  localization)  is  compiled  by  Lee  (2010),  stressing  the  relative  dearth  of 
nonparametric  methods.  Khodadadi  and  Asgharian  (2008)  provide  a  bibliography 
concerning  the  change-point  problem  in  the  context  of  regression.  Tran,  Gaber,  and 
Sattler  (2014)  describe  recent  change-detection  efforts  as  applied  to  streaming  data. 
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James,  James,  and  Siegmund  (1992)  present  a  likelihood  ratio  test  approach  for 
detecting  a  change  in  the  mean  vector  of  a  multivariate  normal  distribution.  The  James, 
James,  Siegmund  (JJS)  method  remains  a  pillar  in  parametric  change  detection  (Chen  & 
Gupta,  2011),  and  we  use  this  test  as  an  example  of  a  parametric  approach  to  compare 
with  the  graph-theoretic  methods  presented  later.  JJS  formulate  the  test  statistic  for 
detecting  a  single  change  in  the  mean  vector  of  a  multivariate  normal  distribution  as 
follows.  For  independent  p-dimcnsional  multivariate  normal  observations 
Y1,...,Y.,...,YJV  with  a  common  covariance  matrix  and  a  change-point  candidate  Tin 

|tq,. ..,Tj},  the  modified  likelihood  ratio  test  statistic  is 


N 


Tm  =  max 

r0<r<r,  t(N-T) 


f  T 


1 


un--sns'n 

V  yv  J 


TV  T  \ 

S’-NS« 


V 


(1) 


s,=£y„  v,=XYX 


i=l 


i=l 


JJS  also  present  an  approximation  to  the  tail  probabilities  as  follows: 


Pr  [rJJS  >  ^]  = 

1  f  Nx  ,  _  \(N-P- 3)/2  r'l  1 

1  j  r(l-r) 


r(p/2) 


V  2  J 


t(l-t)(l-.r) 


dt,  0  <  x  <  1, 


(2) 


where  T0/N  — and  tJN  —>tl  as  TV  — 0<t0<t1<l,  and 


'(/)  =  -cxp|- 


(3) 


where  Of/)  is  the  standard  normal  cumulative  distribution  function.  The  JJS  test  detects 
change  in  an  interval  of  possible  change  points,  but  does  not  attempt  to  localize  it.  For  the 
example  data  sets  of  Section  A,  we  obtain  JJS  p- values  of  0.17  for  the  no-change  case, 
and  p-value  0.000029  for  the  change  case.  The  conclusion  is  that  there  is  a  distributional 
change  in  the  change  case. 

Parametric  methods  are  developed  for  specific  distributions  and  may  lack 
robustness  when  applied  to  other  underlying  data  distributions,  as  can  be  the  case  when 
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the  sampling  distribution  is  unknown.  This  was  explicitly  shown  by  Ruth  (2009)  for  JJS. 
We  next  describe  nonparametric  approaches  to  change  detection  in  multivariate  data. 

C.  NONPARAMETRIC  APPROACHES  TO  CHANGE  DETECTION 

An  overview  of  both  static  and  sequential  nonparametric  change-detection 
methods  is  contained  in  Chapter  2  of  Brodsky  and  Darkhovsky  (2000).  A  nonparametric 
approach  to  detect  and  localize  mean  change  in  multivariate  data  is  introduced  in 
Darkhovsky  (1994)  using  the  family  of  Kolmogorov-Smimoff  type  statistics.  Qui  and 
Hawkins  (2003)  present  a  nonparametric  cumulative  sum  (CUSUM)  procedure  for 
detecting  shifts  in  the  mean  vector  of  a  multivariate  measurement.  Their  distribution-free 
procedure  uses  order  information  from  both  the  measurements  and  their  null  hypothesis 
means.  Lung-Yut-Fong,  Levy-Leduc,  and  Cappe  (2012)  contribute  a  nonparametric  two- 
sample  homogeneity  test  for  multivariate  data  based  on  the  Wilcoxon  rank  statistic.  They 
apply  their  test  to  static  change  detection  to  estimate  multiple  change  points.  Li,  Zhang, 
and  Jeske  (2013)  propose  two  multivariate  CUSUM  procedures  as  nonparametric 
counterparts  to  the  parametric  multivariate  CUSUM  of  Crosier  (1988).  The  two 
procedures,  based  on  spatial  sign  and  data  depth,  show  power  equivalent  to  Crosier’s 
multivariate  CUSUM  in  detecting  location  and  scale  changes  when  applied  to  normally 
distributed  data,  and  higher  power  than  Crosier’s  test  when  applied  to  non-normally 
distributed  data.  Holland  and  Hawkins  (2014)  provide  a  nonparametric  CUSUM  method 
of  detecting  location  changes  in  multivariate  data  without  requiring  significant  data 
history.  Their  procedure  is  based  on  an  approximately  distribution-free  generalization  of 
the  Wilcoxon-Mann- Whitney  test.  Preuss,  Puchstein,  and  Dette  (2014)  pursue  spectral 
distribution  estimation  to  detect  multiple  covariance  structural  changes  of  a  multivariate 
process.  Matteson  and  James  (2014)  use  nonparametric  hierarchical  clustering  to  conduct 
static  change  detection  and  localization  in  multivariate  data.  Their  procedure  allows  for 
detection  of  distributional  change  of  any  nature  and  includes  testing  for  estimating 
multiple  change  points. 
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D.  GRAPH-THEORETIC  APPROACHES  TO  CHANGE  DETECTION 

An  area  of  nonparametric  change  analysis  that  has  shown  promise  is  graph- 
theoretic  statistics.  We  review  graph-theoretic  procedures  for  change  analysis,  starting 
with  methods  developed  for  the  classic  two-sample  problem  (which  we  specify  below), 
and  proceed  to  current  change  detection  and  localization  techniques. 

1.  Graph  Theory  Terminology 

We  introduce  a  few  graph  theory  concepts,  following  conventions  in  West  (2001). 
These  terms  are  summarized  in  Table  2  at  the  end  of  this  chapter.  A  graph  G  consists  of  a 
vertex  set  V,  an  edge  set  E.  and  a  relationship  between  vertices  and  edges.  We  convey  the 
vertex-edge  relationship  in  a  drawing  with  vertices  as  points  and  edges  as  lines,  or  by 
naming  each  edge  by  its  two  incident  vertices,  e.g.,  edge  { i,  j } .  The  two  vertex  endpoints 
of  an  edge  are  neighbors  and  adjacent.  A  simple  graph  has  no  duplicate  edges  or  edges 
with  the  same  starting  and  ending  vertex.  This  dissertation  is  restricted  to  simple  graphs. 
The  degree  of  a  vertex  is  the  number  of  incident  edges.  All  vertices  of  a  regular  graph 
have  the  same  degree,  say  k;  such  a  graph  is  ^-regular.  A  directed  graph  specifies  an 
ordered  pair  of  vertices  for  each  edge:  (tail,  head).  For  example,  directed  edge  ( i,j )  goes 
from  vertex  i  to  vertex  j.  For  a  directed  graph,  the  outdegree  of  vertex  i  is  the  number  of 
edges  with  tail  i;  the  indegree  of  vertex  i  is  the  number  of  edges  with  head  i.  Edges  may 
be  assigned  edge  weights  according  to  some  nonnegative  measure  of  distance  or  cost.  A 
complete  graph  has  edges  between  all  possible  vertex  pairs. 

Having  defined  the  basic  elements  of  a  graph,  we  continue  with  additional 
concepts.  A  subgraph  is  a  subset  of  a  graph;  said  graph  contains  the  subgraph.  A 
spanning  subgraph  of  a  graph  is  a  subgraph  including  all  vertices  of  the  graph.  A  factor 
is  a  spanning  subgraph.  A  k-factor  is  a  spanning  ^-regular  subgraph.  A  minimum 
spanning  subgraph  with  property  P  is  the  spanning  subgraph  of  G  with  property  P  that 
minimizes  the  sum  of  the  included  edge  weights.  Subgraphs  having  no  vertices  in 
common  are  disjoint,  while  subgraphs  having  no  edges  in  common  are  edge -disjoint. 
Two  or  more  edge-disjoint  subgraphs  with  identical  vertex  sets  are  an  ensemble.  A 
matching  is  a  set  of  edges  with  no  shared  endpoints.  A  perfect  matching  is  a  matching 
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incident  to  every  vertex.  The  edge  set  of  a  1 -factor  is  a  perfect  matching.  An  undirected 
nearest-neighbor  subgraph  includes  only  those  edges  connecting  each  observation  to  its 
nearest  (minimum  edge  weight)  neighbor. 

Continuing  our  introduction  of  graph  theory  concepts,  a  path  is  a  graph  whose 
vertices  are  adjacent  if  and  only  if  they  are  consecutive  in  an  ordered  list.  A  Hamiltonian 
path  is  a  spanning  path.  A  cycle  is  a  closed  path  where  the  first  and  last  vertices  are  the 
same.  A  connected  graph  contains  at  least  one  path  from  each  vertex  to  every  other 
vertex.  A  tree  is  an  acyclic  connected  graph.  A  spanning  tree  is  a  spanning  subgraph  that 
is  a  tree.  The  edge  cardinality  |ii|  of  a  spanning  tree  is  one  less  than  the  vertex  cardinality 

|Zt|.  A  minimum  spanning  tree  (MST)  is  the  spanning  tree  that  minimizes  the  sum  of  the 

included  edge  weights.  A  degree-constrained  MST  (A' MST)  has  the  additional  constraint 
that  the  maximum  degree  of  each  vertex  is  k.  A  2MST  is  a  shortest  Hamiltonian  path. 

Now  we  apply  graph  theory  to  change  detection.  We  represent  the  data  under 
analysis  as  a  complete  graph.  Let  N  be  the  number  of  observations.  Each  observation  is  a 
vertex,  labeled  sequentially  from  1  to  N,  while  undirected  edges  represent  connections 
between  observations  (we  consider  directed  edges  later).  A  complete  graph  of  N 
observations  has  edge  cardinality  \E\  =  N  (N  -l)/2 ,  and  each  of  the  N  vertices  is  of 

degree  N~  1.  Edge  weights  ( dij )  reflect  a  standard  distance  metric  such  as  Euclidean, 

Manhattan  (L,  distance),  Chebyshev  ( E  metric),  or  Mahalanobis  (scale-invariant) 

distance.  To  detect  a  change  in  the  underlying  distribution  of  the  data,  we  analyze  the 
behavior  of  minimum  spanning  subgraphs. 

2.  The  Two- Sample  Problem 

The  two-sample  problem  is  posed  as  follows.  Let  Y1,...,Y,...,Y  represent 

p-dimcnsional  observations,  such  that  n\  observations  are  independently  sampled  from 

distribution  F\,  and  m2  observations  are  independently  sampled  from  distribution  F2 

[N  =  /77j  +  /«,,)  .  Two  distributions  are  equal  (F\  =  Fi)  if  and  only  if  their  cumulative 

distribution  functions  F\(t )  and  Fi(t)  and  equal  for  all  I.  Based  on  the  samples,  is  there 
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enough  evidence  to  reject  the  null  hypothesis  that  the  distributions  F \  and  F2  are 
identical?  H0:  F\  =  F2,  H\.F\±  F2. 

For  univariate  observations  (p  =  1),  Wald  and  Wolfowitz  (1940)  propose  a 
statistic  derived  by  sorting  the  data  in  non-decreasing  order  regardless  of  the  sample 

source  (e.g.,  Yl<...<Y.  <...<YN,  V/e{2,...,./V-l}).  We  count  the  total  number  of 

runs,  a  run  being  one  or  more  consecutive  observations  (as  sorted)  from  one  source.  A 
relatively  small  count  of  runs  is  indicative  of  different  underlying  distributions. 
Subsequent  authors  extend  the  two-sample  problem  to  change  analysis  where  F \  and  F2 
are  multivariate. 


3.  A  Test  Using  the  Minimum  Spanning  Tree 

Friedman  and  Rafsky  (1979)  propose  a  MST-based  test  to  generalize  the  Wald- 
Wolfowitz  runs  test  to  multivariate  data.  The  context  again  is  the  two-sample  problem,  in 
which  there  are  two  samples  of  known  size  (m,,m2)  drawn  from  two  unknown 

multivariate  distributions  (FVF2).  The  observations  are  represented  as  a  complete  graph, 

with  the  vertices  (observations)  identified  by  sample  source,  edges  linking  all  pairs  of 
vertices,  and  edge  weights  dij  given  by  Euclidean  distance  or  some  other  measure  of 
vertex  dissimilarity.  The  MST  has  edge  cardinality  \Emst\  =  N - 1.  An  edge  indicatory  is 

assigned,  which  is  equal  to  one  if  edge  e  links  vertices  from  different  samples,  and  xe  =  0 
otherwise.  All  edges  with  indicator  xe=  1  are  deleted,  and  the  resulting  number  of  disjoint 
subtrees  R  is  obtained.  This  results  in  a  statistic  similar  to  a  runs  test: 


Y.-Y. 

'  i 


X  = 


1 1,  if  edge  e  links  vertices  from  different  samples 
[O,  otherwise 


,  Ve  g  {!,..., A -l} 


(4) 


N-l 


*=2>,+i- 


This  statistic  is  nonparametric,  but  it  is  not  distribution-free.  Friedman  and  Rafsky  show 

the  variance  of  R  depends  on  the  topology  of  the  MST;  specifically,  the  variance  depends 
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on  number  of  MST  edges  that  share  a  common  vertex.  Standardizing  the  R  statistic  based 
on  the  null  hypothesis  expected  value  and  conditional  variance  provides  a  Z  score  that 
can  be  compared  to  the  standard  normal  distribution  to  obtain  a p- value. 

A  relatively  small  disjoint  subtree  count  indicates  heterogeneity  between  the  two 
samples.  The  extreme  cases  give  insight  as  to  why  this  test  is  able  to  detect  heterogeneity. 
The  minimum  possible  value  of  R  is  2,  indicating  that  the  MST  includes  only  a  single 
edge  incident  to  observations  in  both  samples.  The  maximum  possible  value  of  R  is  N, 
indicating  that  the  MST  consists  solely  of  edges  incident  to  observations  in  both  samples. 
Figure  2  shows  the  MSTs  for  the  example  data  sets  presented  above  in  Section  A. 


Figure  2.  Minimum  spanning  trees  for  example  data  sets.  The  no-change  case 

is  to  the  left;  the  change  case  is  to  the  right. 

Using  mi  =  mi  =  10  for  both  cases,  the  no-change  case  and  the  change  case  yield  R  values 
of  12  and  3,  respectively.  The  resulting  p- values  from  Friedman  and  Rafsky’s  test  are 
0.68  and  0.000093,  respectively.  Thus,  the  null  hypothesis  that  F\  =  F2  is  not  rejected  in 
the  no-change  case,  and  it  is  rejected  in  the  change  case.  Zhou,  Zi,  Geng,  and  Li  (2014) 
apply  the  MST  in  a  statistical  process  control  setting  to  detect  change. 

As  suggested  by  John  Tukey,  Friedman  and  Rafsky  (1979)  show  that  including  a 
second  or  third  edge-disjoint  MST  generally  improves  the  heterogeneity-detection  power 
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of  the  test.  Unfortunately,  there  is  no  guarantee  that  one  may  extract  an  additional  edge- 
disjoint  MST  from  the  samples.  Constructing  a  MST  requires  0(|Zs|logiV)  computation 

time  (West,  2001);  for  a  complete  graph,  |C  is  0(/V2). 


4.  Tests  Using  the  Minimum  Matching 

Rosenbaum  (2005)  extends  the  work  of  Friedman  and  Rafsky  by  using  a 
minimum-matching  methodology  and  producing  the  cross-match  test,  which,  unlike  the 
Friedman  and  Rafsky  test,  is  distribution-free.  Again,  the  data  in  question  are  known  to 
be  from  two  samples.  For  the  sake  of  discussion,  we  assume  N  is  even,  although  an  odd 
value  of  N  may  also  be  accommodated.  Instead  of  minimum  spanning  trees,  the  cross¬ 
match  test  uses  the  edges  of  the  minimum  non-bipartite  matching  (MNBM).  Non- 
bipartite  matching  refers  to  pairing  observations  without  regard  to  their  sample  identity. 
The  MNBM  has  cardinality  |£MNBM|  =  N/2. 


The  context  again  is  the  two- sample  problem,  in  which  there  are  two 
p-dimensional  samples  of  known  size  (mvm2)  drawn  from  two  unknown  distributions 

(FvF2).  We  represent  the  observations  as  a  complete,  undirected  graph  (V,  E ),  with  the 

vertex  set  V referenced  as  i  or  j  and  the  edge  set  E  referenced  by  its  endpoints,  edge  { i,  /}. 
Using,  for  example,  the  Euclidean  distance  metric,  we  define  the  edge  weights  dy  as  the 
distances  between  observations.  A  set  of  binary  edge  indicator  variables  xy  indicates 
MNBM  membership,  with  xy  =  xy: 


V{i,;}s£ 


(l.  if  f-d e  MNBM  ,  , 

x..=<  L  J  ,  v\i,j\eE. 

|o,  otherwise 


(5) 


In  matrix  form,  X  is  also  known  as  the  MNBM  adjacency  matrix.  We  solve  the  following 
integer  linear  program  for  X,  the  edge  indicators  of  the  MNBM: 
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(6) 


min  Y  cLx,; 

Y  lJ  u 

{iJ^E 

S.t.  Xxy=1’  V ieV 

JtV 

x0e  {0,!},  V  {/,  j}e  E. 


In  a  MNBM,  observations  are  more  likely  to  be  adjacent  to  observations  drawn 
from  their  same  distribution  rather  than  observations  drawn  from  the  other,  as  explained 
in  Rosenbaum  (2005).  The  cross-match  statistic  A  counts  those  edges  in  the  MNBM  that 
include  an  endpoint  from  each  sample.  A  smaller  value  of  the  statistic,  i.e.,  more  edges 
within  samples  as  opposed  to  across  samples,  provides  evidence  against  identical  source 
distributions.  Under  the  null  hypothesis,  the  probability  of  exactly  a  edges  connecting 
observations  from  the  two  samples  is 


P(A  =  a)  ■ 


2a(N/2)l 


N 

Vm2 


((mx-a)l2)\a\((m2-a)l2)\ 


0  <  a  <  min 


(7) 


Figure  3  shows  the  MNBMs  for  the  example  data  sets  of  Section  A. 


Figure  3.  Minimum  non-bipartite  matchings  for  example  data  sets.  The  no¬ 
change  case  is  to  the  left;  the  change  case  is  to  the  right. 
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Using  10  for  both  sample  sizes  m\  and  m2,  the  no-change  case  and  the  change  case  yield 
A  values  of  4  and  0,  respectively.  The  resulting  p-values  from  Rosenbaum’s  test  are  0.43 
and  0.0014,  respectively,  failing  to  reject  the  null  hypothesis  in  the  no-change  case,  and 
rejecting  it  in  the  change  case.  The  empirically  observed  computation  time  required  to 
extract  a  MNBM  with  the  Kolmogorov  (2009)  implementation  of  the  Blossom  algorithm 
is  on  the  order  of  iV“  log  N. 

Other  minimum  subgraphs  have  been  considered  for  change  detection  as  well. 
Biswas,  Mukhopadhyay,  and  Ghosh  (2014)  propose  a  two-sample  statistic  based  on  the 
approximate  shortest  Hamiltonian  path  as  defined  earlier  in  this  section.  The  shortest 
Hamiltonian  path  problem  is  non-deterministic  polynomial-time  (NP)  hard;  the  authors 
employ  a  suboptimal  heuristic  algorithm.  In  higher  dimensions,  their  test  shows  power 
superior  to  both  Friedman  and  Rafsky’s  MST  test  and  Rosenbaum’s  cross-match  test. 

Recent  work  extends  these  two-sample  tests  into  the  realm  of  change  detection 
and  localization.  Ruth  (2009)  and  Ruth  and  Koyak  (2011)  generalize  Rosenbaum’s  cross¬ 
match  test  by  considering  a  sequence  of  observations  instead  of  two  samples.  Their  exact 
test  iteratively  divides  sequentially  ordered  observations  into  two  subsets,  the  first  n\ 

observations  and  the  last  m2=N-ml  observations,  \/ml  e  {2,3,...,lV-l}.  Each  two- 

subset  sequence  is  evaluated  using  the  edges  of  the  same  MNBM  to  yield  a  set  of  cross¬ 
match  statistics,  which  detect  change  in  an  interval  of  possible  change  points. 

Under  the  alternative  hypothesis  of  distributional  change  in  an  observation 
sequence,  a  MNBM  tends  to  pair  observations  that  are  closer  in  sequence  than  would  be 
expected  under  the  null  hypothesis.  The  sum  of  pair-differences  across  all  pairs  should 
accordingly  be  smaller  in  situations  when  the  null  hypothesis  should  be  rejected.  Ruth 
(2009)  considers  an  equivalent  statistic,  the  sum  of  pair- maxima  (SPM): 

r»=£i>r  ® 

i= 2  j= 1 

Ruth  establishes  a  normal  approximation  rejection  criterion  for  Tn  that  is  valid  under  the 
null  hypothesis  that  there  is  no  change.  For  the  example  data  of  Section  A,  the  no-change 
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case  and  the  change  case  yield  T20  values  of  134  and  127,  respectively.  From  Ruth 
(2009),  the  critical  value  for  73 o  at  the  0.05  test  level  is  129,  so  we  reject  the  null 
hypothesis  at  the  0.05  test  level  for  the  change  case. 

Chen  and  Zhang  (2015)  propose  a  statistic  based  on  the  number  of  cross-matches 
in  a  MST,  MNBM,  or  undirected  nearest-neighbor  subgraph  by  allowing  the  first  sample 
to  consist  of  vertices  {1,  0  <  j  <  N.  They  use  their  statistic  to  determine  single  or 

multiple  change  points,  in  both  low-  and  high-dimensional  data.  Like  Friedman  and 
Rafsky  (1979),  Chen  and  Zhang  demonstrate  increased  power  for  richer  graph  structures. 
They  consider  three  edge-disjoint  MSTs,  MNBMs,  and  undirected  nearest-neighbors 
subgraphs. 

5.  Tests  Using  Ensembles  and  Factors 

Graph-theoretic  tests  derive  their  power  from  the  idea  that  when  observations  are 
drawn  from  two  different  distributions,  the  edges  of  a  minimum  subgraph  such  as  a  MST 
or  MNBM  tend  to  associate  observations  from  the  same  distribution  more  often  than 
those  from  different  distributions.  Edge-disjoint  ensembles  further  exploit  this  idea,  as  the 
use  of  second  and  third  edge-disjoint  MSTs  show  in  Friedman  and  Rafsky  (1979)  and 
Chen  and  Zhang  (2015).  Ruth  (2009)  and  Ruth  and  Koyak  (2011)  show  that,  compared  to 
a  single  MNBM,  ensembles  of  higher  order  provide  substantial  increases  in  power. 

Ensembles  are  obtained  recursively  as  follows: 

(1)  Start  with  the  complete  graph. 

(2)  Extract  the  MNBM  using  the  Blossom  algorithm  of  Kolmogorov  (2009). 

(3)  Remove  the  MNBM  edges  from  the  graph;  they  are  part  of  the  ensemble. 

(4)  Repeat  steps  (2)  and  (3)  until  the  desired  ensemble  order  is  achieved. 

Let  k  denote  the  number  of  edge-disjoint  MNBMs  in  an  ensemble.  Let  kMNBM 
refer  to  an  ensemble  of  k  different  MNBMs.  The  edge  cardinality  of  a  kMNBM  is 
E/ainbm  |  =  kN/2.  If  the  number  of  observations  N  is  even,  than  N/2  distinct  edge-disjoint 
subgraphs  are  guaranteed  to  be  obtainable  (Anderson,  1972). 
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Ruth  (2009)  introduces  the  ensemble  sum  of  pair-maxima  (ESPM)  test  based  on 
forming  k  =  N/2  edge-disjoint  MNBMs  and  the  previously  discussed  SPM  statistic.  The 
cumulative  sum  of  pair-maxima  over  the  first  k  MNBM  is 

k  =  l,...,N/2,  (9) 

i=  i 


where  7}  is  the  SPM  statistic  for  the  jth  MNBM  of  the  ensemble.  Ruth  expresses  the 
ESPM  statistic  as  follows: 


B  =  max 

Ae{l,2,K  N/2] 


where  c  =  (N  -l) 


E[Sk]~Sk 

v  c_ _ J_ 

N(N  +  l) 


180 


(10) 


The  similarity  of  sequential  values  of  Sk  to  a  Brownian  bridge  motivates  the  scaling 
constant  c  and  the  formulation  for  the  ESPM  statistic,  as  shown  in  Ruth  (2009).  The 
ESPM  statistic  is  not  distribution-free  under  the  null  hypothesis  that  all  observations  are 
from  the  same  distribution. 


The  ESPM  statistic  achieves  desirable  power — essentially  matching  the  change- 
detection  power  of  JJS  under  JJS-optimal  conditions,  and  exceeding  parametric  methods 
when  applied  to  non-normally-distributed  data.  Ruth  and  Koyak  (2011)  demonstrate  that 
the  power  of  an  ensemble -based  test  increases  as  the  number  of  matches  increases.  The 
empirically  observed  computation  time  for  recursive  matching  is  on  the  order  of  N  4,  a 
consequential  drawback  for  larger  data  sets. 

Bodgan  (2014)  applies  the  ESPM  to  sequential  testing,  insofar  as  proposing 
appropriate  methods  to  maintain  test  level  and  achieve  reasonable  power  during  multiple 
testing.  This  approach  shows  promise  to  adapt  other  static  methods  to  sequential 
scenarios,  while  recognizing  the  need  for  more  computationally  efficient  graph-theoretic 
approaches. 

Is  there  a  statistic  that  achieves  the  detection  power  of  the  ESPM,  yet  comes  from 
an  alternative  subgraph  extracted  in  a  way  that  does  not  require  recursive  matching?  We 
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propose  advancing  directly  to  a  minimum  ^-regular  spanning  subgraph.  Such  a  subgraph, 
referred  to  as  the  minimum  ^-factor,  shares  the  same  edge  cardinality  as  the  AMNBM. 
Unlike  the  kMNBM,  the  minimum  k-factor  is  not  necessarily  decomposable  into  a  series 
of  k  edge-disjoint  MNBMs  and  is  not  recursively  determined.  Using  the  set  of  binary 
edge  indicator  variables  x y,  we  obtain  the  minimum  /c- factor  adjacency  matrix  (X)  by 
making  the  following  slight  modification  to  the  optimization  problem  of  (6): 

min  V  cL.x.. 
v  ^  y  y 

{i,j]eE 

s.t.  '^xij=k,  \/ieV  (11) 

j£V 

Xij  e  {0,1},  V{/,y}e  E. 

In  Chapter  III,  we  demonstrate  that  by  using  optimization  software  such  as  CPLEX 
(IBM,  2013),  obtaining  a  minimum  /.'-factor  has  an  empirically  observed  computation 
time  on  the  order  of  N  '  . 

As  discussed  in  more  detail  in  Chapter  III,  we  develop  a  statistic  based  on  an  edge 
scoring  function  h(i,  j )  applied  to  the  edge  indicator  xy. 

T=  X  h{^j)xr  (12) 

{'j}e£ 

Some  example  candidate  scoring  functions  already  discussed  include  a  cross-match 
indicator  and  the  pair-maximum: 

.  .  fl,  if  vertices  i  and  j  are  from  different  samples 

h(i,j)  =  < 

[0,  otherwise  (13) 

h(i,j)  =  max  {/,/}. 

6.  Tests  Using  Nearest-Neighbor  Graphs 

Statistics  derived  from  nearest  neighbors  (NN)  have  also  been  used  in  change- 
detection  problems.  NN  is  a  subgraph  of  the  complete  directed  graph  with  | V  |  directed 
edges,  one  from  each  vertex,  such  that  each  edge  (i,j)  means  vertex  j  is  the  NN  of  vertex 
i.  A  NN  graph  is  not  symmetric;  (i,j)e  ENN  does  not  imply  (  /,/)e  Em.  NN  graph 
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vertices  all  have  outdegree  one,  but  their  indegree  may  take  any  integer  value  on 
[0,  A  - 1].  Figure  4  shows  the  NN  graphs  for  the  example  data  sets  of  Section  A. 


Figure  4.  Nearest-neighbor  graphs  for  example  data  sets.  The  no-change  case 

is  to  the  left;  the  change  case  is  to  the  right. 


The  undirected  NN  graph  is  a  subgraph  of  the  MST.  A  kNN  graph  includes  edges  to  the 
/c- nearest  neighbors  of  each  vertex  and  has  edge  cardinality  \Ekm  \  =  kN . 


Most  applications  of  NN  fall  generally  into  the  areas  of  classification  and 
regression.  The  use  of  NN  in  classification,  often  associated  with  machine  learning  and 
pattern  recognition,  involves  assigning  an  observation  to  a  group  based  on  the  group 
identification  of  its  NN  (Kotsiantis,  2007).  NN  regression  typically  involves  assigning  a 
value  to  an  observation  based  on  the  value  of  its  NN  (Altman,  1992). 

Schilling  (1986)  and  Henze  (1988)  both  provide  similar  statistics  for  multivariate 
two-sample  testing  using  NN.  Let  A7V,  (r)  be  the  rth  NN  of  observation  i.  We  define 
indicator  variable  /,  (r)  by 


[l,  if  NNi  (r)  belongs  to  the  same  sample  as  observation  i 
[0,  otherwise. 


(14) 
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Now  the  statistic  for  testing  the  null  hypothesis  of  identical  sampling  distributions  is 
given  by  (Henze,  1988) 

T,,„=±±I,(r\  (15) 

i= 1  r= 1 

This  quantity  counts  all  edges  in  which  an  observation  and  its  neighbor  are  from  the  same 
sample.  Large  values  of  T^n  signal  rejection  of  the  null  hypothesis.  Henze  shows  that 
under  the  null  hypothesis,  the  statistic  is  distribution-free  and  asymptotically  normal, 
although  permutation  testing  is  recommended  for  smaller  sample  sizes.  Using  the  single 
NN  as  shown  in  Figure  4,  and  a  sample  size  of  10  for  both  mi  and  m2,  the  no-change  case 
and  the  change  case  yield  Tj^o  values  of  8  and  19,  respectively.  The  resulting  permutation 
p-values  from  the  test  are  0.76  and  0.0002,  respectively.  We  fail  to  reject  the  null 
hypothesis  in  the  no-change  case,  and  reject  it  in  the  change  case.  Schilling  conducts  a 
simulation  study  of  the  detection  power  of  7jyv  for  location  and  scale  changes, 
demonstrating  an  increase  in  change-detection  power  as  k  increases  from  one  to  three. 

Hall  and  Tajvidi  (2002)  propose  a  NN-based  approach  to  the  multivariate  two- 
sample  problem,  calibrated  by  permutation  testing.  Consider  a  pooled  sample  'T/of  size  N 
composed  of  two  independent  random  samples  Y \  and  Yi  of  size  m\  and  m2  from 
distributions  F\  and  Fi,  respectively.  Define  quantity  M\_,  (k)  as  the  number  of  F2 
observations  that  are  ANN  of  observation  Y\A,  1  <  /  <  m] ,  1  <  A-  <  TV  —  1  (and  similarly 

define  M2j(k)).  Under  the  null  hypothesis,  conditional  on  the  pooled  sample,  M\j(k )  has 
a  hypergeometric  distribution  (as  does  AT, (A)): 


which  implies  that,  under  the  null  hypothesis: 

E[Mu(k)fr]  =  j£ 


(17) 
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The  conditional  probability  and  expected  value  of  Mijik)  are  analogous,  interchanging 
nil  and  m2.  The  test  statistic  T  is  based  on  the  absolute  deviations  of  M\  and  M2  from  their 
expected  value  under  the  null  hypothesis,  denoted  as  DM\  and  DMn  respectively.  Hall 
and  Tajvidi  include  y>  0  and  nonnegative  weights  w  1  and  w’2  as  adjustable  parameters  for 
T: 


1  mj  m2  1  m2  my 

r=-X£DM.,(*F  «'.(*)+ — 21dm,,  (kfw2(k). 

m2  (=1  k= 1 


(18) 


m\  (=1  k= 1 


Hall  and  Tajvidi  use  permutation  testing  to  obtain  quantiles  for  the  test  statistic, 
conditional  on  the  set  of  pairwise  distances  in  <$/.  In  a  simulation  study,  they  determine 
that  the  simplest  case  of  y=  1  and  constant  weight  functions  can  be  used. 

In  the  two-sample  context,  NN  directed  edges,  contrasted  with  MNBM  undirected 
edges,  present  additional  opportunities  for  constructing  a  cross-match  test.  Extracting  a 
MNBM  results  in  two  types  of  undirected  edges:  within-group  ( F{  — >  F1 )  and  across- 

group  (Fj  — >  F,  or  F2  — >  Fx ) .  Extracting  a  NN  subgraph  results  in  within-group  directed 
edges  (Fj  — >Fj)  as  well  as  two  distinct  types  of  across-group  directed  edges:  Fl  — >  F, 
and  F2  — >  Fx.  This  additional  information,  obtained  with  relatively  low  computation  time, 

may  result  in  cross-match  tests  that  compare  favorably  with  Rosenbaum’s  test  in  terms  of 
power  and  computational  efficiency.  Vaidya  (1989)  demonstrates  finding  the  exact  A'NN 
of  each  of  N  observations  in  O  ( kN  log  N )  computation  time.  Approximate  algorithms 

provide  a  A'NN  computation  time  of  C)((N  +  k)  p log  N ) ,  which  is  a  reduction  unless 
p  >  kN/(N  +  k )  (Arya,  Mount,  Netanyahu,  Silverman,  &  Wu,  1998). 


7.  Change  Detection  in  Higher  Dimensions 

Graph-theoretic  tests  are  subject  to  the  curse  of  dimensionality  (Bellman,  1961, 
see  Appendix  A).  The  sparseness  of  multivariate  samples  grows  in  higher  dimensions, 
which  reduces  the  power  of  Friedman  and  Rafsky’s  MST  test  and  Rosenbaum’s  cross¬ 
match  test.  Chen  and  Friedman  (2015)  derive  additional  statistics  from  the  MST  and 
MNBM  to  compensate  for  this  aspect  of  the  curse  of  dimensionality.  Ruth  (2009)  finds  a 

21 


reduction  in  the  change-detection  power  of  the  ESPM  statistic  when  moving  from  five¬ 
dimensional  to  20-dimensional  observations.  Beyer,  Goldstein,  Ramakrishnan,  and  Shaft 
(1999)  discuss  the  utility  of  NN  as  dimensionality  increases.  They  find  that  for  sample 
sizes  of  one  million  observations,  the  distance  to  the  NN  approaches  the  distance  to  the 
farthest  neighbor  in  as  few  as  10  to  15  dimensions. 


Table  2.  Graph-theory  terminology. 


Term 

Definition 

graph 

a  vertex  set,  an  edge  set,  and  a  relationship  between  vertices  and  edges 

neighbors;  adjacent 

two  vertices  connected  by  an  edge 

vertex  degree 

the  number  of  incident  edges 

regular  graph 

a  graph  in  which  all  vertices  have  the  same  degree 

directed  graph 

a  graph  with  directed  edges,  specifying  an  ordered  pair  of  vertices  for 
each  edge:  (tail,  head) 

vertex  i  outdegree 

the  number  of  edges  with  tail  i 

vertex  i  indegree 

the  number  of  edges  with  head  i 

edge  weight 

a  nonnegative  measure  of  distance  or  cost  assigned  to  each  edge 

complete  graph 

a  graph  with  edges  between  all  possible  vertex  pairs 

subgraph 

subset  of  a  graph;  the  original  graph  contains  the  subgraph 

spanning 

including  all  vertices  of  the  original  graph 

factor 

spanning  subgraph 

k-  factor 

spanning  ^-regular  subgraph 

minimum  spanning 
subgraph 

spanning  subgraph  that  minimizes  the  sum  of  the  included  edge 
weights 

disjoint 

subgraphs  having  no  vertices  in  common 

edge-disjoint 

subgraphs  having  no  edges  in  common 

ensemble 

two  or  more  edge-disjoint  subgraphs  with  identical  vertex  sets 

matching 

a  set  of  edges  with  no  shared  endpoints 

perfect  matching 

a  matching  incident  to  every  vertex 

nearest  neighbor 
subgraph 

a  subgraph  including  only  those  edges  connecting  each  observation  to 
its  nearest  (minimum  edge  weight)  neighbor. 

path 

a  graph  whose  vertices  are  adjacent  if  and  only  if  they  are  consecutive 
in  an  ordered  list 

Hamiltonian  path 

a  spanning  path 

cycle 

a  closed  path  where  the  first  and  last  vertices  are  the  same 

connected  graph 

a  graph  containing  at  least  one  path  from  each  vertex  to  every  other 
vertex 

tree 

an  acyclic  connected  graph 

spanning  tree 

a  spanning  subgraph  that  is  a  tree 

minimum  spanning 

the  spanning  tree  that  minimizes  the  sum  of  the  included  edge  weights 

tree 
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III.  NEW  APPROACHES  TO  GRAPH-THEORETIC  CHANGE 

DETECTION 


In  this  chapter,  we  introduce  new  statistical  tests  for  change  detection  based  on 
the  subgraphs  introduced  in  Chapter  II.  We  also  consider  the  effects  of  varying  the  match 
degree,  how  best  to  score  edges  in  the  subgraph,  and  how  to  obtain  a  p- value  for  testing 
purposes.  Finally,  using  simulated  data,  we  estimate  the  change-detection  power  of  our 
proposed  test  statistics,  comparing  them  to  parametric  and  nonparametric  alternatives. 

A.  MOTIVATION 

In  Chapter  II,  we  establish  that  graph-theoretic  methods  have  the  ability  to  detect 
change,  but  that  the  most  powerful  methods  (as  demonstrated  by  Ruth,  2009  and  Ruth  & 
Koyak,  2011)  are  also  computationally  intensive.  For  example,  Ruth  (2009)  found  the 
Ensemble  Sum  of  Pair-Maxima  (ESPM)  test  statistic  to  have  an  empirically  observed 
computation  time  on  the  order  of  N4.  We  ask  the  questions:  instead  of  recursively 
forming  A  edge-disjoint  matchings,  do  we  maintain  power  and  reduce  computational 
expense  by  directly  forming  a  minimum  graph  factor  of  degree  k,  and  deriving  a  single 
statistic  for  that?  If  we  are  interested  in  computational  ease,  does  a  statistic  based  on  a 
A-nearest-neighbors  (ANN)  subgraph  provide  power  comparable  to  a  A-factor? 

B.  ALTERNATIVE  MULTIPLE-MATCHING  SUBGRAPHS 

We  propose  and  evaluate  two  new  families  of  graph-theoretic  statistics  based  on 
the  undirected  edges  of  the  minimum  A-factor,  where  A  is  the  common  degree  of  the 
vertices,  and  the  directed  edges  of  the  ANN  subgraph. 

1.  Minimum  A-Factor 

Consider  the  number  of  observations  N  to  be  even.  As  established  in  Chapter  II 
Section  D,  at  least  N/2  edge-disjoint  non-bipartite  matchings  may  be  recursively  obtained 
from  a  complete  graph.  Instead  of  serial  extraction  of  an  ensemble,  we  proceed  directly  to 
a  minimum  A-factor.  For  A  =  N/2,  this  subgraph  is  denoted  the  minimum  half  factor, 
based  on  the  minimum-distance  pairing  for  the  data.  The  distance  between  observations  i 
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and  j,  using  Euclidean,  Mahalanobis,  or  another  distance  metric,  is  represented  as  dij. 
Using  the  set  of  binary  edge  indicator  variables  xy,  we  obtain  the  general  minimum 
k- factor  adjacency  matrix  (X)  by  solving  an  integer  linear  program: 

min  Y 

Y  w  y  y 

s.t.  ^ ]xij=k ,  VieV  (19) 

j£V 

Xyt  {0,1},  v {/,/}-£  E. 

The  edge  cardinality  of  the  minimum  Ufactor  is  Nkl 2.  Figure  5  shows  the  minimum  half 
factor  for  the  example  data  sets  from  Chapter  II  Section  A.  The  change  case  is 
distinguished  by  an  uneven  concentration  of  edges  compared  to  the  no-change  case. 
Table  3  lists  the  10  matches  for  each  observation.  The  no-change  case  has  56  of  its  100 
edges  matching  vertices  from  different  samples,  while  the  change  case  has  16  such  edges. 


Figure  5.  Minimum  half  factor  for  example  data  sets,  plotted  in  a  circular 
layout  for  clarity.  The  no-change  case  is  to  the  left;  the  change  case  is  to 

the  right. 
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Table  3.  Minimum  half  factor  matches  by  vertex  (z)  for  example  data  sets. 

The  no-change  case  is  to  the  left;  the  change  case  is  to  the  right. 
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2.  Multiple  Nearest  Neighbors 

In  addition  to  statistics  based  on  a  minimum  k-factor,  we  also  explore  statistics 
based  on  a  nearest  neighbor  (NN)  subgraph.  NN  subgraphs  have  two  advantages  over 
minimum  ^-factors.  First,  there  is  additional  information  to  exploit  as  the  edges  are 
directed,  and  second,  finding  NN  is  known  to  have  lower  computational  costs  compared 
to  minimum  k-factors  as  discussed  earlier.  We  define  the  kNN  edge  set  as  the  directed 
edges  from  the  entire  edge  set  E  that  are  indicated  by 


J 1,  if  j  is  a  A: -nearest  neighbor  of  i 
[0,  otherwise 


V(z,  j)e  E. 


(20) 


The  edge  cardinality  of  the  kNN  subgraph  is  Nk.  Figure  6  shows  the  k=  10  NN  graph  for 
the  bivariate  example  from  Chapter  II  Section  A.  Again,  the  difference  between  the 
change  and  no-change  cases  is  apparent,  but  not  as  sharp  as  for  the  minimum  half  factor. 
Table  4  lists  the  10  NN  for  each  observation. 
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Figure  6.  Nearest-neighbors  graphs  with  k=  10  for  example  data  sets,  plotted 
in  a  circular  layout  for  clarity.  The  no-change  case  is  to  the  left;  the 
change  case  is  to  the  right. 


Table  4.  The  10-nearest  neighbors  of  each  vertex  (;)  for  example  data  sets. 

The  no-change  case  is  to  the  left;  the  change  case  is  to  the  right. 
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The  no-change  case  has  1 1 1  of  its  200  edges  matching  vertices  from  different  samples, 
while  the  change  case  has  46  such  edges.  Centrally  located  observations,  such  as  2  and  15 
in  the  no-change  case,  are  10-NN  of  more  than  ten  other  observations;  this  high  NN 
indegree  is  not  indicative  of  a  violation  of  homogeneity. 

3.  Computation  Time  Considerations 

Before  developing  statistics,  we  establish  the  relative  computation  effort  for 
generating  minimum  subgraphs.  For  comparison,  our  computations  were  completed  on  a 
64-bit  Dell  OptiPlex  990  Windows  7  platform  with  an  Intel  Core  i5-2400  3.1  GHz 
processor,  8  GB  RAM,  using  R  (R  Core  Team,  2013)  version  3.0.1  with  packages 
cplexAPI  (Gelius-Dietrich,  2013)  and  FNN  (Beygelzimer,  Kakadet,  Langford,  Arya, 
Mount,  &  Li,  2013)  and  using  CPLEX  version  12.5. 

It  is  computationally  less  expensive  to  directly  generate  a  minimum  /.'-factor  than 
to  sequentially  generate  k  edge-disjoint  minimum  non-bipartite  matchings  (kMNBM). 
The  Blossom  algorithm  (Kolmogorov,  2009)  is  more  efficient  at  extracting  a  single 
MNBM  than  using  a  general-purpose  integer  linear  program  solver  such  as  CPLEX.  As 
the  value  of  k  increases,  the  CPLEX  solver  is  much  more  effective  at  obtaining  the 
minimum  k-factor  than  recursive  use  of  the  Blossom  algorithm.  We  demonstrate  this  in 
Figure  7.  The  times  are  the  medians  observed  from  100  simulations  at  each  value  of 

k  e  {1,2,3,4,5,10,25,50,.  ..,250},  in  each  instance  using  500,  10-dimensional  independent 

standard  normal  observations.  The  CPLEX  computation  times  are  relatively  constant 
with  respect  to  k,  while  the  computation  times  of  recursive  use  of  the  Blossom  algorithm 
increase  in  an  almost  linear  manner.  Although  this  is  based  on  a  sample  size  of  500,  we 
have  observed  essentially  the  same  result  using  different  values  of  N. 
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Figure  7.  Computation  time  to  obtain  a  minimum  ^-factor  for  N  =  500, 
10-dimensional  observations  sampled  from  a  multivariate  normal 
distribution,  for  the  CPLEX  algorithm  and  the  recursive  (edge-disjoint) 
Blossom  V  algorithm.  The  times  are  medians  observed  from  100 

simulations  at  each  value  of  k,  k  e  jl,2,3,4,5,10,25,50,...,250j. 

For  the  datasets  considered,  we  empirically  observe  that  the  computation  time  for 
an  edge-disjoint  ensemble  (kMNBM)  is  on  the  order  of  N4,  and  that  the  computation 
times  for  minimum  half  factor  and  /cNN  subgraphs  are  both  on  the  order  of  TV"'  .  Figure  8 
shows  the  median  computation  times  required  to  extract  the  minimum  subgraph  for 
different  sample  sizes,  plotted  on  logarithmic  scales.  Again,  we  use  100  simulations  of 
10-dimensional  data  at  each  dataset  size.  Although  the  linear  regression  lines  for  the 
minimum  half  factor  and  kNN  series  are  nearly  parallel,  the  ratio  of  their  computation 
times  is  about  40,  revealing  a  substantial  computational  advantage  in  favor  of  /cNN.  For 
example,  with  N  =  1,000  observations  and  k  =  N/2,  solving  for  the  minimum  half  factor 
takes  about  12  seconds  while  solving  for  the  kNN  takes  about  0.25  seconds.  In  contrast, 
the  kMNBM  takes  almost  1 1  minutes  to  solve. 
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Figure  8.  Comparison  of  median  computation  times  for  fcMNBM,  minimum 
half  factor  and  the  A'NN  vs.  sample  size,  plotted  on  a  log-log  scale  with 
fitted  regression  lines,  for  k  =  N/2,  with  100  simulations  per  sample 
size.  The  regression  line  slopes  for  the  &MNBM,  minimum  half  factor 
and  kNN  are  4.21,  2.60,  and  2.66,  respectively. 


Table  5  shows  in  more  detail  the  computational  effort  for  the  fcMNBM,  minimum 
half  factor,  and  &NN  for  k  =  N/2  as  N  increases.  The  times  in  Table  5  are  the  quantiles 
observed  from  100  simulations  using  10-dimensional  data.  The  branch-and-bound  exact 
algorithm  for  the  minimum  half  factor  leads  to  considerable  variability  in  computational 
effort. 
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Table  5.  Quantiles  of  observed  computation  times  (seconds)  for  k  edge-disjoint  minimum  non-bipartite  matchings 

(TMNBM),  minimum  k- factor  (kF),  and  ^-nearest  neighbors  (&NN)  for  k  =  N/2  as  data  size  (N)  varies,  observed 

in  100  simulations  of  10-dimensional  data  for  each  value  of  N. 

[)ala  Quantiles  of  observed  computation  times  (seconds) 


Size 

Maximum 

0.90 

0.75 

0.50  (Median) 

0.25 

0.10 

Minimum 

(AO 

AMNBM 

AF 

ANN 

AMNBM 

1  kF 

ANN  AMNBM 

[  kF 

ANN 

AMNBM 

kF 

ANN 

AMNBM 

1  kF 

ANN  AMNBM 

1  AF 

ANN 

AMNBM 

AF 

ANN 

100 

0.07 

0.13 

0.02 

0.05 

0.08 

* 

0.05 

0.07 

* 

0.04 

0.03 

* 

0.04 

0.03 

* 

0.03 

0.03 

* 

0.03 

0.03 

* 

160 

0.33 

0.23 

0.02 

0.25 

0.21 

0.02 

0.25 

0.20 

* 

0.24 

0.11 

* 

0.23 

0.09 

* 

0.22 

0.09 

* 

0.20 

0.09 

* 

200 

0.63 

0.36 

0.02 

0.59 

0.33 

0.02 

0.57 

0.22 

0.01 

0.55 

0.18 

* 

0.53 

0.17 

* 

0.51 

0.17 

* 

0.49 

0.15 

* 

300 

2.86 

0.97 

0.02 

2.77 

0.91 

0.02 

2.73 

0.84 

0.02 

2.63 

0.55 

0.01 

2.56 

0.50 

0.01 

2.48 

0.48 

* 

2.35 

0.45 

* 

400 

9.88 

1.94 

0.04 

9.25 

1.76 

0.03 

8.98 

1.63 

0.03 

8.75 

1.08 

0.03 

8.46 

0.99 

0.02 

8.26 

0.96 

0.01 

7.72 

0.89 

0.01 

500 

28.38 

3.26 

0.06 

26.99 

3.01 

0.05 

26.48 

2.78 

0.05 

25.95 

1.91 

0.04 

25.34 

1.73 

0.03 

24.82 

1.68 

0.03 

23.26 

1.59 

0.03 

600 

68.92 

8.21 

0.08 

64.05 

4.76 

0.08 

63.08 

4.51 

0.07 

61.92 

2.99 

0.06 

60.38 

2.81 

0.06 

59.03 

2.73 

0.06 

56.56 

2.66 

0.06 

1,000 

703.8 

24.46 

0.29 

675.0 

16.67 

0.28 

663.05 

15.86 

0.27 

653.6 

12.02 

0.25 

635.8 

11.05 

0.25 

623.2 

10.50 

0.25 

610.7 

10.09 

0.23 

2,000 

# 

114.7 

1.87 

# 

98.51 

1.81 

# 

92.87 

1.79 

# 

77.21 

1.77 

# 

70.40 

1.74 

# 

66.65 

1.71 

# 

61.67 

1.67 

*  Some  &NN  computations  were  less  than  one-hundredth  of  a  second. 

#  Computations  of  &MNBM  were  not  conducted  for  N  =  2,000. 
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c. 


TEST  STATISTICS:  FORMATION  AND  PROPERTIES 


In  this  section,  we  introduce  test  statistics  based  on  the  minimum  A-factor  and 
ANN  subgraphs  and  discuss  general  properties.  Our  test  statistics  take  the  form  of  using 
information  about  the  vertex  labels  of  the  minimum  subgraph  edges.  While  we  derive 
uni  modal  statistics,  they  are  not  necessarily  normally  distributed. 

1.  General  Form  of  Test  Statistics 

In  Chapter  II,  we  introduce  Rosenbaum’s  cross-match  test  (Rosenbaum,  2005). 
Following  extraction  of  the  MNBM,  the  edges  are  scored  according  to  the  sample  identity 
of  their  endpoints: 

/  \  1 1,  if  vertices  i  and  /'  are  from  different  samples 

h(i,j)  =  \„  u  .  (21) 

[0,  otherwise. 

In  the  case  of  directed  graphs,  (/,  j)  implies  “from  i  to  /.”  Other  score  functions  may  also 
be  considered.  In  the  sum  of  pair-maxima  test  of  Ruth  (2009),  the  score  function  used  is 
h(i,j)  =  max)  A/}.  For  a  general  score  function  h(i,j)  we  define  the  following  test 
statistic: 

r  =  X  (22) 

Z  i= 1  j=hj*i 

2.  Properties  of  Test  Statistics 

Properties  of  a  test  statistic  taking  the  form  of  (22)  depend  on  whether  the 
minimum  subgraph  is  a  minimum  A-factor  (undirected  edges)  or  a  ANN  subgraph 
(directed  edges). 

a.  Minimum  k-F actor  Test  Statistic  Characteristics 

For  a  minimum  A-factor  and  unspecified  score  function,  T  has  a  known  expected 
value  and  variance,  which  we  derive  here  (R.  A.  Koyak,  personal  communication, 
January  8,  2014).  Recognizing  the  symmetric  nature  of  the  adjacency  matrix  (X)  for  the 
minimum  A-factor: 
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(23) 


T=\t  1 

^  «= 1  j=hj*i  i= 2  7=1 

In  addition  to  terms  defined  above,  we  let  Vi  represent  the  set  of  vertices  that  in  the 
minimum  ^-factor  are  adjacent  to  vertex  i.  Now  £[.r(/]  =  Pr[y  e  Vt] .  By  our  formulation, 
2^  =  *  Vie  V7 ,  so  then  ^  Pr[_/ e  V)]  =  k.  It  follows  that  Pr[y e  V7]  =  k/(N  —  l), 

i  ^  j.  We  now  obtain  the  following  expression: 

2  tz*(u) 

h=  ‘=2J=l _ 

N(N-l)  (24) 

kN{N-l)h  _kNh 
[  J  2(1V-1)  2 

We  expand  terms  in  the  variance  as  follows: 


M = v  'L'lLh(i>j)xv  =  EE7*2 O'* j)v [xij ] +  X XX h (r> M (r> s)Cov [a » a . 


r=3  y'=l  5=1 


N  N  min(r,r)-l  N  r_{ 

-XX  X  /Hi>)/Hr’5)Cm{v,’xn]+XXX/7(/’r)/7(r’5)Cm;h,’xrJ  ^25) 


i=2  r=2  5=1 


i=3  r=2  5=1 


A/-  i-1  AT  r-1 


+XXX/i(s’^Mr’s)Covh’*rJ+XXXX/i(MMr’s)Cov[:v*™  • 


r=3  5=2  7=1 


i= 2  7=1  r=2  5=1 
r±i  s^i 
r*j  s*j 


The  variance  of  the  edge  indicators  is  expressed  as  follows: 

v[.v,,]  =  Pr[yeV’(]Pr[7gV,]  =  ('-A-Yl--iTj,  V{«,y}e  E.  (26) 

v 7V  1 A  7V  1 

Note  that  V  [v;;]  depends  on  neither  i  nor  y;  we  instead  use  T2 .  Each  of  the  next  four 
terms  of  (25)  involves  shared  vertices,  and  each  covariance  expression  is  the  same: 
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4V~]  =  to[j,se  Vr | j  *  s]  =  j 

Cov\  x  ,x  \  =  Cov\x.,x  \-Cov\x.,x  \  =  Cov\x.,x 

[_  rj  rs  J  L  is  rs  J  |_  ir  rs  J  |_  sj  rs  J 


\N-\)\N-2)  \N-\)  N-2 

Let  i,  j,  r,  s  denote  distinct  vertices.  Now  E^x^x^  J  =  Pr  [j e  Vi  |.v e  V r ]  Pr [,v e  \/] .  We 
find  Pr  [  y  e  Vi  |.ve  Vr]  as  follows: 

k  =  {N-l)Pr[jeVi] 

=  Pr[se  V;|se  Vr]  +  Pr [re  V;|se  P,]  +  (iV-3)Pr[  j e  Vt  |se  Vr] 

=  ^Z^-+(N-3)Pr[je  V, |se  V,];  (28) 

Pr[js  V,|ie  V,]=  /AF~y  +  2  . 

L  (|  (N  —  2)(N  —  3) 

This  leads  to  our  final  covariance  term: 


E[XiJXrs]  = 

Cov[xy,xrs~\  = 


(. N-A)k2+2k 
(N-l)(N-2)(N-3) 

(. N-4)k2+2k  (  k  Y_  2r2 

(N  —  l)(N  —  2)(N  —  3)  {  N-l  I  ~~  (N  —  2)(N  —  3) ' 


We  now  substitute  the  results  of  (27)  and  (29)  into  (25): 


N  i- 1  N  r- 1 


1=2  7=1  iV  Z  i=2  7=1  r=2  5=1 

Card({/,7-}n{r,5})=l 

Card(f\/}n|rvsJJ-0 


To  simplify  the  second  term,  we  define  the  following  two  sums: 
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(31) 


H(-,j)=^h(i,j),  l<  j  <  N -1,  H(-,N)  =  0 

i=j+ 1 

=  2  <i<N,  H(  1,.)  =  0. 

j= i 


We  now  obtain  the  result: 


v[r]: 


1  k  v 


A-l 


1-- 


v  N  —  \j 


v2 

V  U  9 


where  v: 


A-l 


(32) 


+ 


(lV-2)(lV-3) 


(  N  i- 1  3 

V  1=2  M 


Although  we  have  exact  expressions  for  the  expected  value  and  variance  of  T, 
there  is  no  indication  that  T  is  approximately  normally  distributed  in  large  samples.  To 
assess  the  normality  of  T,  we  analyze  the  results  of  simulations  both  graphically  and 
numerically.  Using  h(i,j)  =  \i-j\  (pair-differences)  and  choosing  k  =  N/2,  we  simulate 
100,000  instances  of  A  =200  five-dimensional  multivariate  normal  observations  under 
the  null  hypothesis. 

Figure  9  presents  a  normal  quantile -quantile  plot  for  simulated  values  of  T  scored 
by  pair-differences  with  k  =  N/2.  The  quantile-quantile  plot  has  a  concave  down  shape 
indicating  left-skewness.  A  Shapiro-Wilk  test  for  normality  on  a  500-observation  sample 
of  the  statistic  gives  a  p-value  of  5.3xl0“12,  evidence  that  T  is  unlikely  to  be 
approximately  normally  distributed,  even  in  large  samples. 

Due  to  the  evident  skewness  and  more  generally  the  lack  of  a  normal  distribution 
for  T ,  it  is  inappropriate  to  determine  critical  values  using  a  normal  approximation  with 
the  known  expected  value  and  variance.  We  discuss  methods  to  obtain  a  p- value  for 
testing  purposes  in  Section  D  of  this  chapter. 
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Theoretical  Quantiles 

Figure  9.  Normal  quantile-quantile  plot  for  statistic  T  based  on  100,000  null 
hypothesis  simulations  of  N  =  200  five-dimensional  multivariate  normal 

observations. 


b.  Test  Based  on  Nearest  Neighbors 

Our  &NN  test  statistic  has  a  known  expected  value.  The  expected  value  takes  the 
same  form  as  the  minimum  k-factor  test  statistic,  from  (24): 


Em_kN(N-l)h  _kNh 
[  J  2(iV-l)  2 


(33) 


The  variance  has  no  neatly  expressible  closed  form  due  to  its  dependence  on  the 
conditional  probability  of  observation  i  being  a  NN  of  observation  j,  given  that  j  is  a  NN 
of  i.  This  conditional  probability  is  in  turn  dependent  on  the  unknown  sampling 
distribution  of  the  data  under  the  null  hypothesis.  We  discuss  methods  to  obtain  a  p- value 
for  testing  purposes  in  Section  D  of  this  chapter. 


3.  Best  Formulation  for  Test  Statistics 

We  have  three  unanswered  questions  regarding  our  proposed  statistics: 
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(1) 


Is  there  a  particular  score  function  that  provides  the  best  detection  power? 

(2)  How  can  we  exploit  the  directional  information  from  A'NN  tests? 

(3)  Is  there  a  particular  value  of  k  that  provides  the  best  detection  power  for 
tests  using  either  the  ^-factor  or  /cNN? 

We  use  a  variety  of  simulation  scenarios  to  estimate  change-detection  power, 
similar  to  the  scenarios  of  Ruth  (2009).  Detection  power  is  the  probability  of  rejecting  the 
null  hypothesis  when  the  null  hypothesis  is  false.  The  critical  value  for  a  test  is  based  on 
the  appropriate  quantile  from  100,000  null  hypothesis  simulations. 

In  all  cases,  we  illustrate  the  power  to  detect  change  under  scenarios  in  which 
N  =  200  observations  are  taken  from  our  sample  space  M'' ,  where  p  is  the  observation 
dimension.  For  change  scenarios,  we  place  a  change  point  in  the  middle  of  the 
observation  sequence.  Conducting  10,000  simulations  per  scenario,  we  estimate  detection 
power  as  the  fraction  of  times  that  the  test  correctly  rejects  the  null  hypothesis  when  it  is 
false,  using  a  test  level  of  a  =  0.05 .  We  also  vary  the  following  factors: 

(1)  Sampling  distribution.  We  consider  multivariate  normal  (Fmvn), 
multivariate  normal  mixture  ( Fmix ) ,  and  multivariate  Weibull  (FWeib) 
probability  distribution  families. 

(2)  Dimension.  We  analyze  scenarios  in  two  dimensions,  p  =  5  and  p  =  20. 

(3)  Change  parameter.  We  examine  changes  in  mean  and  scale. 

(4)  Type  of  change.  We  consider  cases  of  abrupt  (jump)  and  gradual  (drift) 
change  in  the  parameter  undertaking  change. 

(5)  Magnitude  of  change  (A).  We  consider  changes  of  varying  magnitudes. 

Our  distributions  are  generated  as  follows.  The  multivariate  distribution  FMVN  is 
based  on  the  density  function: 

/mvn (y;M)  =  ^,2  s  1/2 exp j"(y~ ll)  S"1  (y - h) j  VyeMp,  (34) 
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where  p.  e  M''  is  the  mean  vector,  and  X  e  M pYp  is  the  covariance  matrix.  Under  the  null 
hypothesis,  p  =  0  and  I  =  Ip  (I p  is  the  pXp  identity  matrix)  so  that  y~A^0,/J.  For 
changes  in  mean,  the  first  element  of  the  mean  vector  pA  is  a  value  A  ^  0.  For  changes  in 
scale,  I  =  (l  +  A)2/;). 

The  multivariate  normal  mixture  distribution  Fmix  is  more  outlier-prone  than  the 
multivariate  normal  distribution.  It  is  formed  by  mixing  two  multivariate  normal 
distributions  Np  (0, 1 p )  and  Np  (0,  ) ,  where  \ff>  1.  We  take 


Y  =  [l  +  (^-l)5]Z, 


(35) 


where  B  is  a  Bernoulli  random  variable  with  success  probability  q.  For  our  simulations, 
y/=  4  and  q  =  0. 1 .  For  changes  in  mean,  the  first  element  of  the  mean  vector  is  equal  to 

A;  for  changes  in  scale,  we  substitute  (l  +  A)”/p  for  Ip  in  the  covariance  matrix  of  the 
two  multivariate  normal  distributions  we  are  mixing. 

We  define  FWeib  to  be  the  distribution  on  W  associated  with  p  independent 
identically  distributed  univariate  Weibull  random  variables: 


^Weib  (y>^7’P) 


y 

n 


l-exp< 


'  y, v 


A 


0,  otherwise 


(36) 


where  rj  and  [i  are  the  univariate  Weibull  shape  and  scale  parameters,  respectively, 
y  =  (y1,...,yj,  and  P  =  (j81,...,j8p).  We  set  77  =  1.5,  /?  =1  +  A,  and  /?,.=!,  je{2,...,p}. 


Our  change  magnitudes  (A)  range  from  0  to  1.0  in  increments  of  0.25.  The 
fraction  of  simulations  rejecting  the  null  hypothesis  when  the  null  hypothesis  is  true 
(A  =  0)  is  representative  of  the  false  alarm  rate  for  the  test.  Our  abrupt  changes  are  of 
magnitude  A  and  occur  at  observation  T.  The  drift  changes  begin  at  observation  r, 
increasing  linearly  until  reaching  the  full  value  of  A  at  observation  N.  For  changes  in 
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distribution  mean  and  changes  in  the  scale  of  the  multivariate  Weibull  distribution,  we 
simulate  the  change  in  only  the  first  component  of  each  observation.  For  changes  in  scale 
for  the  multivariate  normal  and  multivariate  normal  mixture  distributions,  we  simulate 
the  change  in  all  components. 

a.  Selection  of  Score  Function 

Is  there  a  particular  score  function  that  provides  the  best  detection  power?  We 
consider  the  sum  of  pair-differences  score  function  as  discussed  above.  To  magnify  or 
suppress  large  label  differences,  raising  the  pair-differences  to  a  power  (other  than  1)  and 
summing  the  result  produces  a  family  of  possible  score  functions,  in  the  manner  of  a 
Box-Cox  transformation  (Box  &  Cox,  1964).  We  investigate  the  following  score 
functions  with  the  minimum  ^-factor  and  the  &NN: 

*(«,;)  =  -  A  ’  V(;,./)e£,  (37) 

log|/-j|,  (A  =  0) 

for  X  e  {-2.0, -1.5, -1.0, -0.9, ...,0.9, 1.0, 1.5, 2.0} . 

We  set  k  =  N/2  and  evaluate  the  change-detection  power  of  the  resulting 
subgraphs  scored  by  the  transformation  of  (37).  For  each  sampling  distribution  and 
dimension,  we  conduct  100,000  null  hypothesis  simulations  of  the  test  statistic  under  the 
null  hypothesis,  using  the  0.05  quantile  of  the  simulation  values  as  a  critical  value  for  an 
a=  0.05-level  test.  For  each  sampling  distribution,  dimension,  level  of  change  magnitude 
(A),  and  family  parameter  (/l),  we  conduct  10,000  simulations  of  N  =  200  observations 

with  a  jump  in  the  first  mean  vector  component  at  observation  r=  101.  As  shown  in 
Table  6,  for  all  scenarios  examined,  the  value  of  X  =  1  provides  at  or  near  the  maximum 
detection  power  for  the  minimum  half  factor.  We  naturally  prefer  no  transformation,  and 
in  any  case  prefer  to  use  an  interpretable  value  (Faraway,  2005).  There  is  a  narrower 
interval  for  the  /:NN-based  test;  the  95  percent  confidence  limits  for  the  score  function 
parameter  providing  the  maximum  power  in  the  vignettes  examined  intersect  at 
[-0.4,  -0.3].  We  choose  X  =  -0.3,  as  the  value  -0.4  is  at  the  lower  end  of  several  vignette 
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confidence  intervals.  The  negative  parameter  value  indicates  that  there  is  better  &NN 
change-detection  performance  when  the  label  differences  are  suppressed  than  otherwise. 


Table  6.  Score  function  parameter  (A)  95  percent  confidence  interval 
providing  the  maximum  detection  power  at  the  0.05  level  of  test. 

Based  on  10,000  simulations  per  vignette  of  a  jump  change  of 
magnitude  A  in  the  first  component  of  the  mean  vector  of  V  =  200 
p-dimensional  simulated  observations  sampled  from  the  specified 

distribution. 


^-factor  (it  =100)  kNN  (it  =100) 

95%  Confidence  Interval  for  A  95%  Confidence  Interval  for  A 


p 

A 

F 

1  MVN 

F 

mix 

F 

1  Weib 

F 

1  MVN 

F 

mix 

F 

1  Weib 

5 

0.25 

[0.2,  2.0] 

[0.0,  2.0] 

[0.3,  2.0] 

[-0.7,  0.7] 

[-0.9,  0.3] 

[-0.5,  0.3] 

0.50 

[0.4,  1.5] 

[0.4,  1.5] 

[0.3,  1.5] 

[-0.4,  0.1] 

[-0.7,  0.0] 

[-0.4,  0.0] 

0.75 

[0.3,  1.0] 

[0.4,  1.5] 

[0.2,  1.5] 

[-0.4,  0.0] 

[-0.6,  -0.2] 

[-0.4,  0.0] 

1.00 

[0.0,  1.5] 

[0.3,  1.5] 

[-0.1,  2.0] 

[-0.4,  0.1] 

[-0.5,  -0.2] 

[-0.5,  0.0] 

20 

0.25 

[0.0,  2.0] 

[-0.4,  2.0] 

[0.2,  2.0] 

[-1.0,  2.0] 

[-2.0,  2.0] 

[-0.7,  0.3] 

0.50 

[0.4,  2.0] 

[0.3,  2.0] 

[0.4,  1.5] 

[-0.5,  0.2] 

[-1.0,  0.7] 

[-0.5,  0.0] 

0.75 

[0.5,  1.5] 

[0.5,  1.5] 

[0.4,  1.5] 

[-0.4,  0.1] 

[-1.0,  -0.2] 

[-0.5, -0.1] 

1.00 

[0.4,  1.5] 

[0.5,  1.5] 

[0.1,  1.5] 

[-0.4,  0.0] 

[-0.9,  -0.3] 

[-0.5, -0.1] 

Figure  10  shows  the  typical  responses  when  evaluating  the  score  function  with  the 
^-factor  and  with  the  kNN.  with  95  percent  confidence  bounds  on  the  detection  power. 
The  detection  power  shown  is  for  the  second  row  of  Table  6:  a  five-dimensional 
multivariate  normal  distribution  with  an  abrupt  change  of  magnitude  0.5  in  the  first  mean 
vector  component  at  observation  101. 
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Figure  10.  Estimated  detection  power  as  a  function  of  the  score  function 

parameter  X,  for  the  N  =  200  multivariate  normal  observations  with  an 
abrupt  change  of  magnitude  0.5  in  the  first  component  of  the  mean 
vector  at  observation  r=  101,  with  10,000  simulations  per  value  of  X 
with  a  0.05  test  level.  The  minimum  half  factor  test  is  to  the  left,  the 
A'NN  (k  =  100)  test  is  to  the  right.  The  dashed  lines  are  the  95  percent 
confidence  limits  for  the  detection  power. 


b.  Use  of  Directional  Information 

We  now  consider  the  use  of  the  directed  edges  of  the  kNN  subgraph  to  construct  a 
test  statistic  that  is  sensitive  to  distributional  change.  Our  second  statistic  takes  advantage 
of  the  directed  nature  of  kNN  subgraph  edges. 

We  obtain  our  statistic  as  follows.  We  take  the  count  of  those  kNN  whose  index  is 
less  than  the  current  index,  for  all  indices.  For  ease  of  discussion,  a  prior  edge  is  an  edge 
of  the  minimum  subgraph  whose  destination  observation  label  is  less  than  the  source 
observation  label.  An  equivalent  score  function  is  h(i,j)  =  I{i>j},  where  /{ }  is  the 
indicator  function  such  that: 


if  A  is  true, 
if  A  is  not  true. 


(38) 


We  call  this  statistic  the  NN  prior-edge  count  (K): 
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(39) 


1  N  N  1  N  i-l 


1=1  j=l,j^i 


2  i=2  j= 1 


The  A'NN  subgraph  edge  cardinality  is  kN,  and  under  the  null  hypothesis,  K  is  halving  a 
count  of  half  of  the  edges. 


E[K]  = 


kN_ 

4  ' 


(40) 


As  explained  earlier  with  T  based  on  &NN,  there  is  no  neatly  expressible  closed  form  for 
the  variance  of  K. 

We  now  have  two  statistics  based  on  the  /cNN  subgraph.  Table  7  shows  that  under 
the  alternative  hypothesis,  the  attained  change-detection  powers  of  NN  statistics  T  and  K 
vary  greatly  for  the  multivariate  normal  case.  For  illustration,  we  use  N  =  200  and  k  =  100 
for  the  A'NN  subgraph,  and  use  T  of  (22). 


Table  7. 


Estimated  power  of  nearest-neighbor  statistics  T  and  K  to  detect  a 
change  of  magnitude  A  at  observation  101  of  200  drawn  from  a 
five-dimensional  multivariate  normal  sampling  distribution,  with 
10,000  simulations  per  vignette.  Jump  in  the  first  component  of  the 
mean  vector  is  to  the  left;  jump  in  all  components  of  the  scale  is  to 

the  right. 


Emvn;  p 

A 

0.00 

0.25 

0.50 

0.75 

1.00 


Mean  Jump 

T 

K 

0.05 

0.05 

0.11 

0.05 

0.38 

0.05 

0.80 

0.05 

0.98 

0.05 

Scale  Jump 

T 

K 

0.05 

0.05 

0.22 

0.99 

0.82 

1.00 

1.00 

1.00 

1.00 

1.00 

The  T  statistic  shows  power  in  detecting  both  location  and  scale  changes.  The  K 
statistic  is  seen  to  be  effective  for  detecting  scale  changes,  while  exhibiting  no  power  to 
detect  location  changes.  Combining  T  and  K  allows  for  both  location  and  scale  detection. 
We  call  this  combined  test  the  composite  kNN  test.  To  maintain  an  overall  test  level  of 
a  =0.05,  we  use  the  Bonferroni  adjustment  based  on  Boole’s  inequality  (Seneta,  n.d.). 
Setting  a  test  level  of  a  =  0.025  for  each  of  the  two  tests  (by  using  the  0.025  quantile  or 
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by  multiplying  the  minimum  obtained  /;- value  by  two  and  comparing  the  product  to  0.05) 
conservatively  allows  for  maintaining  a  simultaneous  test  level  of  er  =  0.05.  The 
composite  A'NN  test  decision  rule  is  as  follows: 


Test  Decision  = 


J  reject  H0,  if  either  T  or  K  reject  at  a  a 
[fail  to  reject  H  Q,  otherwise. 


0.025  level  of  test; 


(41) 


Applying  the  composite  kNN  test  to  the  simulation  scenario  of  Table  7  results  in  the 
change-detection  powers  determined  from  simulation  shown  in  Table  8.  In  Section  E  of 
this  chapter,  the  composite  A'NN  test  and  the  /^-factor  sum  of  pair-differences  test  are 
compared. 


Table  8.  Simulation  calculations  of  the  power  of  the  composite  &NN  test  to 
detect  a  change  of  magnitude  A  at  observation  101  of  200  drawn 
from  a  five-dimensional  multivariate  normal  sampling  distribution, 
with  10,000  simulations  per  vignette.  Jump  in  the  first  component 
of  the  mean  vector  is  to  the  left;  jump  in  all  components  of  the 
scale  is  to  the  right. 


Emvn;  p  —  5 

A 

Mean  Jump 

Scale  Jump 

0.00 

0.05 

0.05 

0.25 

0.09 

0.98 

0.50 

0.30 

1.00 

0.75 

0.73 

1.00 

1.00 

0.97 

1.00 

c.  Selection  of  k 

For  statistics  derived  from  either  the  minimum  ^-factor  or  the  A'NN  subgraphs, 
based  on  k  =  N/2,  the  k  is  motivated  by  the  desire  to  include  a  rich  subgraph  and  to 
provide  a  test  comparable  in  power  to  the  Ensemble  Sum  of  Pair-Maxima  test  proposed 
by  Ruth  (2009).  There  is  justification  for  considering  different  k  values,  which  we  now 
explore.  The  detection  power  of  our  statistics  may  depend  on  how  many  matches  or 
neighbors  included  in  the  subgraph.  We  examine  the  detection  power  as  a  function  of  k 
and  recommend  values  of  k.  We  start  by  examining  the  behavior  of  the  ESPM  test 
statistic. 
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(1)  Ensemble  Sum  of  Pair-Maxima  Behavior  as  k  Increases 

The  ESPM  test  statistic  is  the  maximum  of  a  process  considering  all  values  of  k 
up  to  k  =  N/2.  The  process  is  constructed  such  that  the  null  hypothesis  expected  value  of 
the  statistic  is  zero.  Any  critical  value  exceedance  occurring  from  k  =  1  to  k  -  N/2 
results  in  change  detection  (rejection  of  the  null  hypothesis).  In  many  cases,  however,  it 
was  observed  that  exceedance  of  the  critical  value  at  any  point  in  the  sequence  usually 
was  met  with  exceedance  at  the  k  =  N/2  point  as  well.  Understandably,  using  the  Ap¬ 
point  of  the  sequence  should  result  in  a  loss  of  power,  which  we  suspect  will  be  minimal 
under  most  circumstances.  Figure  11  shows  the  ESPM  responses  as  the  number  of 
ensemble  matches  increases  for  two  of  the  change  detection  simulation  scenarios 
described  earlier:  an  abrupt  jump  change  in  the  first  component  of  the  mean  vector,  and  a 
gradual  drift  change  in  scale  in  all  five  dimensions,  respectively,  of  200  five-dimensional 
observations  drawn  from  a  multivariate  normal  distribution.  Figures  depicting  ESPM 
response  in  the  multivariate  normal  null  hypothesis  case,  mean  drift  case,  and  scale  jump 
case  are  given  in  Appendix  B. 
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Number  of  Ensemble  Matches  Number  of  Ensemble  Matches 

Figure  11.  The  Ensemble  Sum  of  Pair-Maxima  test  statistic  as  the  number  of 
ensemble  matches  increases  from  1  to  N/2  for  two  change  detection 
scenarios,  with  100  simulations  per  scenario.  To  the  left,  a  0.5 
magnitude  jump  inserted  at  observation  r  =  101  in  the  first  component 
of  the  mean  vector  of  a  five-dimensional  multivariate  normal 
distribution.  To  the  right,  a  0.5  magnitude  drift  change  started  at 
observation  r  =  101  in  the  standard  deviation  of  a  five-dimensional 
multivariate  normal  distribution.  The  thick  black  line  is  the  critical 
value.  Red  traces  indicate  the  ESPM  statistic  exceeding  the  critical 
value  based  on  an  a-  0.05  level  of  test.  Gray  traces  do  not  exceed  the 

critical  value. 


For  the  change  in  the  mean,  shown  on  the  left  in  Figure  11,  although  the  ESPM 
traces  appear  to  be  generally  increasing  as  k  increases,  the  ESPM  value  first  exceeds  and 
then  drops  below  the  critical  value  in  11  of  the  100  simulations.  These  cases  show  as  the 
red  lines  that  end  below  the  thick  black  line.  When  we  consider  only  the  trace  values  at 
k  =  100,  the  detection  power  falls  from  0.57  to  0.46.  The  variance  drift  case,  shown  on 
the  right  in  Figure  1 1,  is  even  more  pronounced.  The  ESPM  value  exceeds  and  then  drops 
below  the  critical  value  in  33  of  the  100  simulations.  Considering  only  the  trace  values  at 
k  =  100,  the  detection  power  falls  from  0.59  to  0.26. 

(2)  Detection  Power  as  k  Varies  -  Minimum  k-Factor  Sum  of  Pair-Differences 

To  evaluate  the  influence  of  k  on  the  detection  power  of  the  sum  of  pair- 
differences  (SPD)  based  on  the  minimum  k-factor  subgraph,  we  investigate  values  of  k 
from  0.05V  to  0.5 V  for  V=  200.  For  some  of  the  simulation  scenarios  outlined  earlier  in 
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this  section,  we  determine  the  change-detection  power  of  the  minimum  ^-factor  SPD.  We 
present  the  results  in  Table  9,  highlighting  the  maximum  detection  power  for  each  case. 


Table  9.  Test  power  to  detect  a  change  based  on  pair-differences 

determined  from  a  the  minimum  ^-factor  for  a  range  of  values  of  k 
on  a  subset  of  sampling  distributions,  dimensions,  change 
parameters,  change  types,  and  change  magnitudes,  using  change 
point  r  — 101  based  onN  =  200,  a  =  0.05 ,  and  10,000  simulations 
per  vignette.  Dimensions  are  p  =  5  unless  otherwise  noted.  The 
maximum  powers  for  each  case  are  shaded. 


Detection  Power  as  k  Varies  -  minimum  k- factor 


Simulation  Scenario 

10 

20 

30 

40 

50 

60 

70 

80 

90 

100 

FMVn  mean  jump  0.75 

0.71 

0.83 

0.88 

0.90 

0.92 

0.92 

0.93 

0.93 

0.94 

0.94 

FMvn  mean  drift  0.75 

0.31 

0.40 

0.46 

0.50 

0.53 

0.55 

0.54 

0.55 

0.57 

0.56 

FMvn  mean  jump  0.75  p  =  20 

0.44 

0.57 

0.61 

0.65 

0.70 

0.71 

0.73 

0.75 

0.72 

0.74 

FMvn  mean  drift  0.75  p  =  20 

0.17 

0.22 

0.24 

0.26 

0.29 

0.30 

0.31 

0.33 

0.30 

0.31 

FMvn  scale  jump  0.50 

0.96 

0.98 

0.97 

0.97 

0.97 

0.96 

0.94 

0.94 

0.91 

0.90 

FMvn  scale  drift  0.75 

0.94 

0.96 

0.96 

0.95 

0.94 

0.92 

0.90 

0.89 

0.85 

0.81 

Fmix  mean  jump  0.75 

0.50 

0.62 

0.68 

0.73 

0.75 

0.77 

0.79 

0.79 

0.80 

0.80 

Fmix  mean  drift  0.75 

0.21 

0.26 

0.30 

0.33 

0.37 

0.38 

0.38 

0.40 

0.40 

0.39 

Fmix  mean  jump  0.75  p  =  20 

0.20 

0.28 

0.31 

0.35 

0.37 

0.38 

0.40 

0.41 

0.39 

0.39 

Fmix  scale  jump  0.50 

0.79 

0.87 

0.89 

0.90 

0.89 

0.89 

0.87 

0.86 

0.83 

0.80 

Fmix  scale  drift  0.75 

0.74 

0.83 

0.84 

0.85 

0.85 

0.84 

0.82 

0.80 

0.77 

0.74 

FWeib  yd  jump  0.75 

0.82 

0.93 

0.96 

0.96 

0.97 

0.97 

0.97 

0.97 

0.97 

0.96 

Fweib  P  drift  0.75 

0.70 

0.84 

0.89 

0.91 

0.92 

0.93 

0.93 

0.93 

0.93 

0.92 

Fweib  ydjump  0.75  p  =  20 

0.51 

0.64 

0.68 

0.71 

0.72 

0.72 

0.71 

0.71 

0.70 

0.68 

For  the  ^-factor  SPD,  the  value  of  k  corresponding  to  the  highest  detection  power 
is  seen  to  depend  on  the  change  parameter  (mean  or  scale)  more  than  on  the  sampling 
distribution,  change  type  (jump  or  drift),  or  dimension.  The  detection  power  for  k  =  N/2  is 
at  or  near  the  highest  detection  power  for  changes  in  mean  and  changes  in  the  j8 
parameter  of  the  multivariate  Weibull  (FWeib).  Cases  with  a  change  in  scale  see  a  eight  to 
15  percentage  points  drop  in  detection  power  from  a  peak  occurring  between  k  =  N/ 10  to 
k  =  N/5.  For  the  change  of  scale  cases  there  is  still  considerable  power  at  k  =  N/2 ,  but  it  is 
not  the  peak  power. 
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(3)  Recommended  Value  for  A  -  Minimum  A-Factor 


We  do  lose  power  in  the  scale  change  cases,  but  at  k  =  Nf 2,  the  A'- factor  SPD  is 
near  enough  to  the  maximum  power  in  most  of  the  multivariate  normal,  heavy-tailed,  and 
skewed  cases.  The  empirically  observed  computation  time  to  obtain  the  minimum 
A-f actor  is  approximately  invariant  with  respect  to  A  as  shown  in  Section  B  of  this 
chapter. 

(4)  Detection  Power  as  k  Varies  -  A-Nearest  Neighbors 

We  similarly  examine  the  composite  ANN  test,  with  statistics  from  (22)  and  (39) 
and  the  Bonferroni  adjustment  of  (41).  We  present  the  results  in  Table  10,  highlighting 
the  maximum  detection  power  for  each  case. 


Table  10.  Test  power  to  detect  a  change  based  on  the  composite  ANN  test  for 
a  range  of  values  of  A  on  a  subset  of  sampling  distributions, 
dimensions,  change  parameters,  types,  and  magnitudes,  using 
change  point  T  =  101  based  on  N  =  200,  a  =  0.05 ,  and  10,000 
simulations  per  vignette.  Dimensions  are  p  =  5  unless  otherwise 
noted.  The  maximum  powers  for  each  case  are  shaded. 


Detection  Power  as  A  Varies  -  ANN 


Simulation  Scenario 

10 

20 

30 

40 

50 

60 

70 

80 

90 

100 

FMvn  mean  jump  0.75 

0.41 

0.54 

0.61 

0.65 

0.68 

0.69 

0.71 

0.71 

0.73 

0.73 

FMvn  mean  drift  0.75 

0.16 

0.21 

0.24 

0.27 

0.28 

0.29 

0.30 

0.31 

0.30 

0.31 

FMvn  mean  jump  0.75  p  =  20 

0.19 

0.26 

0.29 

0.32 

0.33 

0.36 

0.37 

0.37 

0.37 

0.37 

FMvn  mean  drift  0.75  p  =  20 

0.10 

0.12 

0.12 

0.13 

0.13 

0.14 

0.14 

0.14 

0.15 

0.15 

FMvn  scale  jump  0.25 

0.92 

0.96 

0.96 

0.96 

0.96 

0.96 

0.97 

0.97 

0.97 

0.98 

FMvn  scale  drift  0.25 

0.67 

0.71 

0.73 

0.72 

0.74 

0.75 

0.76 

0.77 

0.79 

0.80 

Fmix  mean  jump  0.75 

0.29 

0.39 

0.43 

0.47 

0.49 

0.49 

0.48 

0.47 

0.45 

0.44 

Fmix  mean  drift  0.75 

0.12 

0.16 

0.17 

0.19 

0.19 

0.19 

0.19 

0.19 

0.18 

0.18 

Fmix  mean  jump  0.75  p  =  20 

0.12 

0.13 

0.13 

0.14 

0.13 

0.13 

0.11 

0.12 

0.10 

0.10 

Fmix  scale  jump  0.25 

0.67 

0.74 

0.75 

0.77 

0.76 

0.77 

0.77 

0.77 

0.77 

0.75 

Fmix  scale  drift  0.25 

0.40 

0.46 

0.47 

0.47 

0.48 

0.50 

0.50 

0.51 

0.51 

0.50 

FWeib  [3  jump  0.75 

0.58 

0.74 

0.80 

0.83 

0.85 

0.86 

0.87 

0.88 

0.89 

0.89 

Fweib  P  drift  0.75 

0.33 

0.42 

0.47 

0.51 

0.53 

0.56 

0.56 

0.60 

0.61 

0.62 

Fweib  ySjump  0.75  p  =  20 

0.41 

0.52 

0.59 

0.63 

0.65 

0.66 

0.68 

0.68 

0.68 

0.67 

For  all  of  the  multivariate  normal  (Fmvn)  and  multivariate  Weibull  (FWeib)  cases 
examined,  as  well  as  the  multivariate  normal  mixture  ( Fmix )  scale  change  cases,  the 
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highest  detection  power  is  achieved  at  or  near  k  =  N/2.  For  the  Fmix  location  change  cases, 
the  peak  power  is  obtained  when  using  fewer  neighbors  in  the  subgraph. 

(5)  Recommended  Value  for  Composite  kNN  Test 

We  recommend  the  value  k  =  N/2.  There  is  evidence  this  is  not  the  best  choice  for 
heavy-tailed  distributions,  but  using  k  =  N/2  produces  a  test  that  detects  change 
reasonably  well  across  the  distributions  and  scenarios  tested. 

Our  statistics  now  incorporate  these  explorations  into  the  score  function 
transformation,  directional  information,  and  value  of  k.  A  more  thorough  understanding 
of  the  interaction  between  the  score  function  transformation  and  the  value  of  k  is  left  to 
future  research. 

D.  OBTAINING  A  CRITICAL  OR /?-VALUE 

In  this  section,  we  compare  and  contrast  four  methods  of  obtaining  a  critical  or 
/;- value  for  testing  purposes.  We  consider  accuracy  and  computational  implications  of  the 
methods.  To  demonstrate  concepts,  we  use  the  minimum  half  factor  sum  of  pair- 
differences  (SPD)  statistic.  The  distributions  of  our  statistics  do  not  have  closed  form 
solutions,  although  the  first  two  moments  of  the  minimum  half  factor  SPD  statistic  are 
available. 

1.  Critical  Values  from  Probability  Inequalities 

Knowing  the  expected  value  and  variance,  we  could  generate  a  critical  value  at  an 
appropriate  level  of  test  based  on  Chebyshev’s  inequality  as  shown  in  (42),  but  this  is 
known  to  be  overly  conservative  (Wackerly,  Mendenhall,  &  Scheaffer,  2008). 

Z_T-E[T] 

’  W\  (42) 

p‘'[lzb*A^- 

We  slightly  sharpen  this  inequality  to  a  one-sided  Chebyshev  bound,  also  known  as 
Cantelli’s  inequality  (Ion,  2001).  The  lower-tail  version  of  the  bound  is  shown  in  (43): 
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(43) 


Pr  [Z<-k\<— - . 

L  J  /r+1 

If  the  distribution  of  Z  is  unimodal,  which  our  experience  through  simulations  suggests  is 
a  reasonable  assumption  for  the  minimum  half  factor  SPD  statistic,  an  even  sharper 
inequality  is  possible  (Ion,  2001;  Bickel  &  Krieger,  1992): 

Pr  [Z<-k\<  4  K>S.  (44) 

9(/r  +l) 

For  example,  at  a  0.05  level  of  test,  the  one-sided  Chebyshev  inequality  of  (43)  gives 
critical  value  K=  -4.36,  while  the  unimodal  bound  of  (44)  gives  K=  —  2.81.  Were  our 
statistic  known  to  be  normally  distributed,  the  bound  for  a  0.05  level  of  test  would  be  the 
familiar  K=  -1.645.  Using  the  100,000  null  hypothesis  simulations  discussed  in 
Section  C  of  this  chapter,  the  unimodal  bound  ( K=  -2.81)  results  in  an  approximate  level 
of  test  of  0.010.  It  also  is  possible  to  derive  a  conservative  p- value  for  a  test  by  allowing 
K  to  be  the  standardized  value  of  the  observed  test  statistic.  For  the  example  data  sets  of 
Chapter  II  Section  A,  the  half  factor  SPD  test  applied  to  the  no-change  case  and  the 
change  case  gives  standardized  values  of  Z  =  -0.21  and  Z  =  -8.07,  respectively.  The  no¬ 
change  case  fails  to  reject  the  null  hypothesis,  while  the  half  factor  SPD  test  for  the 
change  case  provides  sufficient  evidence  to  reject  the  null  hypothesis  of  no  change. 

2.  Critical  Values  from  Control  Data 

A  second  method  to  obtain  a  critical  or  p- value  is  the  use  of  control  data  to 
establish  the  distribution  of  our  statistic.  By  having  ample  control  data,  or  by  simulating 
enough  null  hypothesis  scenarios  (say,  100,000),  we  obtain  an  appropriate  percentile  of 
the  desired  statistic  under  the  null  hypothesis,  or  we  can  assess  the  strength  of  evidence 
represented  by  the  observed  value  of  the  statistic  against  it. 

There  are  at  least  one  advantage  and  two  drawbacks  to  estimating  a  critical  value 
from  data.  The  advantage  is  that,  once  it  is  established,  it  is  used  indefinitely  to  signal 
change,  assuming  future  data  follow  the  same  distribution  when  not  undergoing  change. 
A  significant  drawback  is  the  need  for  the  control  data,  which  are  not  always  available. 


48 


Lacking  control  data,  simulation  is  used  to  approximate  the  critical  or  p- value.  This 
method  also  assumes  that  the  null  hypothesis  is  true  for  the  control  data.  If  the  alternative 
hypothesis  is  true  in  our  control  data,  the  underlying  graphical  structure  reflects  that;  we 
then  attain  an  inaccurate  critical  value. 


3.  Critical  or  p-Values  from  Permutation  Testing 

In  lieu  of  solving  for  the  minimum  half  factor  a  large  number  of  times,  we  use  the 
principles  outlined  in  Good  (2005)  to  obtain  a  critical  value.  Under  the  null  hypothesis, 
we  solve  for  the  minimum  half  factor  once.  We  then  permute  the  vertex  labels,  say 
100,000  times,  maintaining  the  underlying  minimum  subgraph  structure.  Computing  the 
desired  statistic  for  each  permutation,  we  establish  the  null  hypothesis  permutation 
distribution  of  the  test  statistic.  Finally,  we  select  the  desired  quantile.  This  method 
requires  just  one  extraction  of  the  minimum  subgraph,  from  just  one  set  of  control  data. 

In  a  similar  manner  we  derive  a  permutation  p-value.  The  observed  value  of  the 
statistic  is  compared  to  the  permutation  distribution  to  obtain  the  p-value: 


p-value  = 


#{permutation  value  <  observed  value} 
#  of  permutations 


(45) 


Consider  the  example  data  sets  from  Chapter  II  Section  A.  For  the  no-change  case  and 
the  change  case,  respectively,  the  minimum  half  factor  SPD  test  results  in  permutation 
p-values  of  0.38  and  0.00,  and  the  composite  kNN  test  results  in  permutation  p-values  of 
0.52  and  0.00.  For  both  tests,  the  null  hypothesis  is  not  rejected  in  the  no-change  case,  but 
is  rejected  in  the  change  case. 


4.  Numerical  Experiments 

Using  two  sets  of  simulations,  we  use  the  same  sampling  distributions  described 
in  Section  C  of  this  chapter  to  compare  the  various  methods  of  obtaining  a  p-value. 
Results  of  these  experiments  are  summarized  in  Table  11.  As  expected,  a  test  based  on 
the  uni  modal  probability  inequality  (44)  is  notably  conservative.  The  quantiles  obtained 
by  the  second  and  third  methods  are  nearly  identical.  These  findings  suggest  that  we  may 
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forgo  many  minimum  subgraph  calculations  and  instead  use  one  minimum  subgraph, 
permuting  the  observation  labels,  as  is  usually  done  in  a  permutation  test  (Good,  2005). 


Table  11.  Quantiles  for  the  standardized  minimum  half  factor  sum  of  pair- 
differences  at  the  a  =  0.05  level  of  test,  determined  by  100,000  null 
hypothesis  simulations  (Simulation),  and  determined  by  a  single 
null  hypothesis  simulation  and  100,000  permutations  of  the 
observation  labels  (Permutation),  for  iV=  200  simulated  null 
hypothesis  observations.  The  null  hypothesis  minimum  half  factor 
sum  of  pair-differences  is  not  normally  distributed. 


Dimension 

Method 

F 

1  MVN 

0.05  quantile 

F 

mix 

0.05  quantile 

F 

1  Weib 

0.05  quantile 

P  =  5 

Simulation 

Permutation 

-1.790 

-1.791 

-1.779 

-1.774 

-1.798 

-1.791 

o 

(N 

II 

o. 

Simulation 

Permutation 

-1.719 

-1.728 

-1.717 

-1.720 

-1.713 

-1.718 

We  compare  the  change-detection  power  of  the  minimum  half  factor  sum  of  pair- 
differences  using  the  above  methods  (using  null  hypothesis  data  to  establish  a  critical 
value  vs.  using  a  permutation  test  to  directly  obtain  the  /;- value)  for  a  subset  of  the  cases 
from  Section  C  of  this  chapter.  As  shown  in  Tables  12  through  14,  the  change-detection 
power  is  virtually  unchanged. 
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Table  12. 


Test  power  to  detect  a  location  change  of  magnitude  A  in  the  first 
component  of  the  mean  vector  at  observation  101  of  200  drawn 
from  a  multivariate  normal  sampling  distribution,  with  10,000 
simulations  per  vignette.  Dimension  p  =  5  case  is  to  the  left; 
dimension  p  =  20  case  is  to  the  right. 


F 

1  MVN  ’ 

mean  change 

p  = 

=  5 

A 

HF  CV 

HF  P 

0.00 

0.05 

0.05 

0.25 

0.16 

0.16 

0.50 

0.58 

0.56 

0.75 

0.94 

0.94 

1.00 

1.00 

1.00 

o 

CN 

II 

o. 

HF  CV 

HF  P 

0.05 

0.05 

0.10 

0.09 

0.32 

0.33 

0.73 

0.74 

0.97 

0.97 

HF:  minimum  Half  Factor  sum  of  pair-differences 
CV :  power  obtained  via  comparison  to  critical  value 
P:  power  obtained  via  permuting  index  labels 


Table  13.  Test  power  to  detect  a  change  of  magnitude  A  in  the  first 

component  of  the  mean  vector  at  observation  101  of  200  drawn 
from  a  multivariate  normal  mixture  sampling  distribution,  with 
10,000  simulations  per  vignette.  Dimension  p  =  5  case  is  to  the 
left;  dimension  p  =  20  case  is  to  the  right. 


F  ■ 

mix  ’ 

mean  change 

p  = 

=  5 

A 

HF  CV 

HF  P 

0.00 

0.05 

0.05 

0.25 

0.12 

0.14 

0.50 

0.42 

0.40 

0.75 

0.80 

0.81 

1.00 

0.97 

0.98 

o 

CN 

II 

o. 

HF  CV 

HF  P 

0.05 

0.05 

0.07 

0.06 

0.17 

0.19 

0.40 

0.38 

0.71 

0.68 

HF:  minimum  Half  Factor  sum  of  pair-differences 
CV :  power  obtained  via  comparison  to  critical  value 
P:  power  obtained  via  permuting  index  labels 
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Table  14.  Test  power  to  detect  a  change  of  magnitude  A  in  the  first 

component  of  the  scale  parameter  / 3  of  observation  101  of  200 
drawn  from  a  multivariate  Weibull  sampling  distribution,  with 
10,000  simulations  per  vignette.  Dimension  p  =  5  case  is  to  the 
left;  dimension  p  =  20  case  is  to  the  right. 


F 

1  Weib  ’ 

j3  change 

p  = 

=  5 

A 

HF  CV 

HF  P 

0.00 

0.05 

0.05 

0.25 

0.25 

0.26 

0.50 

0.73 

0.73 

0.75 

0.96 

0.96 

1.00 

1.00 

1.00 

o 

CN 

II 

o. 

HF  CV 

HF  P 

0.05 

0.05 

0.16 

0.15 

0.59 

0.60 

0.92 

0.92 

0.99 

0.99 

HF:  minimum  Half  Factor  sum  of  pair-differences 
CV :  power  obtained  via  comparison  to  critical  value 
P:  power  obtained  via  permuting  index  labels 


5.  Recommendations  on  Obtaining p-Y alues 

Our  brief  exploration  yields  some  findings.  We  see  the  sharpest  inequality  is 
conservative.  We  determine  that  the  power  to  detect  change  is  independent  of  whether 
one  uses  a  null  hypothesis  simulation-derived  critical  value,  a  null  hypothesis 
permutation-test  critical  value,  or  a  p- value  directly  from  permutation  testing  on 
observations.  This  result  gives  the  investigator  liberty  to  choose  the  method  that  works 
best  for  the  situation. 

In  many  situations,  sufficient  control  data  will  not  be  available.  Even  if  control 
data  are  available,  assuming  the  null  hypothesis  carries  the  risk  that  the  alternative  is 
actually  true.  For  these  reasons,  and  in  light  of  attaining  equivalent  detection  powers 
using  the  latter  three  methods,  our  recommendation  is  to  use  permutation  testing  to 
directly  obtain  the  p- value  for  testing  purposes. 

E.  CHANGE-DETECTION  POWER  COMPARISONS 

We  present  computer  simulation  results  that  compare  the  change-detection  power 
of  the  k-factor  sum  of  pair-differences  (SPD)  test  and  the  composite  kNN  test  to 
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parametric  and  nonparametric  alternatives,  the  James,  James,  and  Siegmund  (JJS)  (1992) 
test  and  Ensemble  Sum  of  Pair-Maxima  (ESPM)  test  of  Ruth  (2009).  As  determined  in 
Section  C  of  this  chapter,  we  use  the  minimum  half  factor  SPD  test  ( k  =  N/2 )  with  score 
function  parameter  A.  =  1,  and  the  k  =  N/2  composite  A'NN  test  with  X  =  -0.3.  We  use  the 
simulation  scenarios  as  outlined  in  Section  C  of  this  chapter. 

We  extract  our  minimum  subgraphs  based  on  the  Euclidean  distance  between 
observations.  Ruth  (2009)  finds  that  using  the  Euclidean  distance  results  in  higher 
detection  powers  than  using  a  covariance-scaled  metric  such  as  the  Mahalanobis  distance. 
Detection  powers  are  based  on  10,000  simulations  per  vignette.  We  vary  the  sampling 
distribution  (multivariate  normal,  multivariate  normal  mixture,  multivariate  Weibull),  the 
dimension  (p  =  5  or  p  =  20),  the  change  parameter  (location  or  scale),  the  change  type 
(jump  or  drift),  and  the  change  magnitude  (0.0  to  1.0  in  increments  of  0.25). 

1.  Multivariate  Normal  Vignettes 

Tables  15  through  17  show  the  detection  powers  for  the  multivariate  normal 
cases.  Discussion  follows  the  tables. 


Table  15.  Test  power  to  detect  a  location  change  of  magnitude  A  in  the  first 
component  of  the  mean  vector  at  observation  101  of  200  drawn 
from  a  five-dimensional  multivariate  normal  sampling  distribution, 
with  10,000  simulations  per  vignette.  Jump  case  is  to  the  left;  drift 
case  is  to  the  right  (after  Ruth,  2009). 


ESPM:  Ensemble  Sum  of  Pair-Maxima  JJS:  James,  James,  and  Siegmund  test 
HF:  Half  Factor  sum  of  pair-differences  A'NN:  composite  A'NN  test 


Drift 

ESPM 

JJS 

HF 

£NN 

0.05 

0.05 

0.05 

0.05 

0.10 

0.09 

0.09 

0.06 

0.26 

0.23 

0.25 

0.13 

0.57 

0.54 

0.56 

0.30 

0.86 

0.86 

0.85 

0.60 

^MVN  ’  P 

mean  change 

Jump 

A 

ESPM 

JJS 

HF 

£NN 

0.00 

0.06 

0.05 

0.05 

0.05 

0.25 

0.17 

0.12 

0.16 

0.08 

0.50 

0.60 

0.53 

0.58 

0.31 

0.75 

0.94 

0.93 

0.94 

0.73 

1.00 

1.00 

1.00 

1.00 

0.97 
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Table  16.  Test  power  to  detect  a  location  change  of  magnitude  A  in  the  first 
component  of  the  mean  vector  at  observation  101  of  200  drawn 
from  a  20-dimensional  multivariate  normal  sampling  distribution, 
with  10,000  simulations  per  vignette.  Jump  case  is  to  the  left;  drift 
case  is  to  the  right  (after  Ruth,  2009). 


ESPM:  Ensemble  Sum  of  Pair-Maxima  JJS:  James,  James,  and  Siegmund  test 
HF:  Half  Factor  sum  of  pair-differences  kNN:  composite  kNN  test 


Drift 

ESPM 

JJS 

HF 

kNN 

0.05 

0.04 

0.05 

0.05 

0.07 

0.05 

0.07 

0.06 

0.14 

0.09 

0.15 

0.09 

0.33 

0.23 

0.31 

0.15 

0.59 

0.49 

0.57 

0.29 

^mvN;p  =  2°; 
mean  change 

Jump 

A 

ESPM 

JJS 

HF 

kNN 

0.00 

0.05 

0.04 

0.05 

0.05 

0.25 

0.10 

0.06 

0.10 

0.07 

0.50 

0.34 

0.21 

0.32 

0.14 

0.75 

0.75 

0.64 

0.73 

0.38 

1.00 

0.98 

0.96 

0.97 

0.73 

Table  17.  Test  power  to  detect  a  scale  change  of  magnitude  A  in  all 
components  at  observation  101  of  200  drawn  from  a  five¬ 
dimensional  multivariate  normal  sampling  distribution,  with 
10,000  simulations  per  vignette.  Jump  case  is  to  the  left;  drift  case 
is  to  the  right  (after  Ruth,  2009). 


ESPM:  Ensemble  Sum  of  Pair-Maxima  JJS:  James,  James,  and  Siegmund  test 
HF:  Half  Factor  sum  of  pair-differences  kNN:  composite  kNN  test 


Drift 

ESPM 

JJS 

HF 

kNN 

0.05 

0.05 

0.05 

0.05 

0.13 

0.11 

0.11 

0.81 

0.53 

0.26 

0.38 

1.00 

0.93 

0.42 

0.81 

1.00 

1.00 

0.55 

0.98 

1.00 

^MVN  ’  P 
scale  change 

Jump 

A 

ESPM 

JJS 

HF 

kNN 

0.00 

0.05 

0.06 

0.05 

0.05 

0.25 

0.32 

0.09 

0.25 

0.98 

0.50 

0.96 

0.15 

0.90 

1.00 

0.75 

1.00 

0.21 

1.00 

1.00 

1.00 

1.00 

0.25 

1.00 

1.00 

A  parametric  change-detection  method  such  as  JJS  is  expected  to  work  only  with 
the  distribution  for  which  it  was  designed.  We  find  the  graph-theoretic  ESPM  and 
minimum  half  factor  SPD  tests  achieve  detection  powers  comparable  with  the  parametric 
JJS  test  in  the  multivariate  normal  mean  change  cases,  consistent  with  Ruth  (2009).  The 
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minimum  half  factor  SPD  detection  power  holds  closely  to  that  of  the  ESPM  test  for 
mean  changes,  but  its  power  lags  somewhat  in  the  scale  drift  case.  This  lag  is  further 
explored  in  Section  C  of  this  chapter.  The  composite  &NN  test  shows  dominance  in  scale 
changes,  but  falls  short  of  the  other  three  tests  in  detecting  mean  changes.  Compared  to 
the  other  tests,  there  is  a  greater  decrease  in  detection  power  for  the  NN  test  going  from 
lower  to  higher  dimensions. 

2.  Multivariate  Normal  Mixture  Vignettes 

Tables  18  through  20  show  the  detection  power  for  the  multivariate  normal 
mixture  cases  as  described  in  Section  C  of  this  chapter. 


Table  18.  Test  power  to  detect  a  location  change  of  magnitude  A  in  the  first 
component  of  the  mean  vector  at  observation  101  of  200  drawn 
from  a  five-dimensional  multivariate  normal  mixture  sampling 
distribution,  with  10,000  simulations  per  vignette.  Jump  case  is  to 
the  left;  drift  case  is  to  the  right.  Shading  indicates  excessive  false 
alarm  rate  (after  Ruth,  2009). 


ESPM:  Ensemble  Sum  of  Pair-Maxima  JJS:  James,  James,  and  Siegmund  test 

HF:  Half  Factor  sum  of  pair-differences  A'NN:  composite  &NN  test 


f^p  =  5; 

mean  change 

Jump 

A 

ESPM 

JJS 

HF 

kNN 

0.00 

0.05 

0.20 

0.05 

0.05 

0.25 

0.12 

0.22 

0.12 

0.07 

0.50 

0.42 

0.32 

0.41 

0.16 

0.75 

0.81 

0.56 

0.80 

0.43 

1.00 

0.97 

0.82 

0.97 

0.78 

Drift 

ESPM 

JJS 

HF 

kNN 

0.05 

0.20 

0.05 

0.05 

0.08 

0.21 

0.08 

0.06 

0.18 

0.25 

0.18 

0.09 

0.39 

0.35 

0.39 

0.17 

0.67 

0.48 

0.66 

0.34 
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Table  19.  Test  power  to  detect  a  location  change  of  magnitude  A  in  the  first 
component  of  the  mean  vector  at  observation  101  of  200  drawn 
from  a  20-dimensional  multivariate  normal  mixture  sampling 
distribution,  with  10,000  simulations  per  vignette.  Shading 
indicates  excessive  false  alarm  rate. 

F^p  =  20; 
mean  change 
A 

0.00 
0.25 
0.50 
0.75 
1.00 

ESPM:  Ensemble  Sum  of  Pair-Maxima  JJS:  James,  James,  and  Siegmund  test 
HF:  Half  Factor  sum  of  pair-differences  A'NN:  composite  A'NN  test 


Table  20.  Test  power  to  detect  a  scale  change  of  magnitude  A  in  all 
components  at  observation  101  of  200  drawn  from  a  five¬ 
dimensional  multivariate  normal  mixture  sampling  distribution, 
with  10,000  simulations  per  vignette.  Jump  case  is  to  the  left;  drift 
case  is  to  the  right.  Shading  indicates  excessive  false  alarm  rate 
(after  Ruth,  2009). 

Fm^p  = 5; 
scale  change 

A 

0.00 
0.25 
0.50 
0.75 
1.00 

ESPM:  Ensemble  Sum  of  Pair-Maxima  JJS:  James,  James,  and  Siegmund  test 
HF:  Half  Factor  sum  of  pair-differences  A'NN:  composite  &NN  test 

For  the  multivariate  normal  mixture  distribution  with  mean  changes,  the  ESPM 
test  demonstrates  nearly  identical  change-detection  power  as  the  minimum  half  factor 
SPD  test.  For  scale  changes,  the  ESPM  test  shows  detection  power  greater  than  the 

minimum  half  factor  SPD  test.  The  composite  /cNN  test  again  demonstrates  relatively  low 
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Jump 


ESPM 

JJS 

HF 

£NN 

0.05 

0.20 

0.05 

0.05 

0.31 

0.22 

0.24 

0.74 

0.89 

0.25 

0.80 

1.00 

1.00 

0.30 

0.99 

1.00 

1.00 

0.33 

1.00 

1.00 

Drift 


ESPM 

JJS 

HF 

£NN 

0.05 

0.21 

0.05 

0.05 

0.14 

0.27 

0.11 

0.50 

0.47 

0.37 

0.37 

0.94 

0.85 

0.45 

0.74 

1.00 

0.99 

0.53 

0.95 

1.00 

detection  power  for  mean  changes  but  shows  superior  detection  power  in  scale  changes. 
Comparing  the  lower  dimensional  and  higher  dimensional  mean  jump  cases,  we  see  a 
greater  decrease  in  detection  power  for  the  composite  &NN  test  compared  to  the  other 
tests.  The  JJS  test  fails  to  hold  its  level  for  this  heavy-tailed  distribution. 

3.  Multivariate  Weibull  Vignettes 

Tables  21  and  22  show  the  detection  power  for  the  multivariate  Weibull  cases  as 
defined  in  Section  C  of  this  chapter. 


Table  21.  Test  power  to  detect  a  scale  parameter  change  of  magnitude  A  in 
the  first  component  of  the  (3  vector  at  observation  101  of  200 
drawn  from  a  five-dimensional  multivariate  Weibull  sampling 
distribution,  with  10,000  simulations  per  vignette.  Jump  case  is  to 
the  left;  drift  case  is  to  the  right.  Shading  indicates  excessive  false 
alarm  rate  (after  Ruth,  2009). 


^Wcib  ’  P 

j3  change 

A 

0.00 

0.25 

0.50 

0.75 

1.00 


Jump 

ESPM 

JJS 

HF 

kNN 

0.05 

0.09 

0.05 

0.05 

0.27 

0.23 

0.25 

0.17 

0.78 

0.70 

0.73 

0.55 

0.98 

0.97 

0.96 

0.88 

1.00 

1.00 

1.00 

0.99 

Drift 

ESPM 

JJS 

HF 

kNN 

0.05 

0.09 

0.05 

0.05 

0.13 

0.18 

0.12 

0.13 

0.42 

0.47 

0.38 

0.33 

0.74 

0.79 

0.68 

0.61 

0.93 

0.95 

0.89 

0.84 

ESPM:  Ensemble  Sum  of  Pair-Maxima 
HF:  Half  Factor  sum  of  pair-differences 


JJS:  James,  James,  and  Siegmund  test 
A'NN:  composite  &NN  test 
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Table  22.  Test  power  to  detect  a  scale  parameter  change  of  magnitude  A  in 
the  first  component  of  the  [3  vector  at  observation  101  of  200 
drawn  from  a  20-dimensional  multivariate  Weibull  sampling 
distribution,  with  10,000  simulations  per  vignette.  Shading 
indicates  excessive  false  alarm  rate. 


Fw  eib;p  =  2°; 

(3  change 

Jump 

A 

ESPM 

JJS 

HF 

k  NN 

0.00 

0.05 

0.07 

0.05 

0.05 

0.25 

0.17 

0.12 

0.16 

0.11 

0.50 

0.62 

0.33 

0.59 

0.32 

0.75 

0.94 

0.70 

0.92 

0.68 

1.00 

1.00 

0.92 

0.99 

0.93 

ESPM:  Ensemble  Sum  of  Pair-Maxima  JJS:  James,  James,  and  Siegmund  test 
HF:  Half  Factor  sum  of  pair-differences  A'NN:  composite  A'NN  test 


For  the  multivariate  Weibull  distribution,  the  ESPM  test  exceeds  the  detection 
power  of  the  minimum  half  factor  SPD  test  and  the  composite  A'NN  test  across  the  two 
dimensions  (p  =  5  and  p  =  20)  considered.  The  composite  A'NN  test  nearly  matches  the 
performance  of  the  minimum  half  factor  SPD  test  for  the  (3  drift  scenario.  The  composite 
A'NN  test  shows  a  greater  decrease  in  power  from  the  lower  dimensional  [3  jump  case  to 
the  higher  dimensional  [3  jump  case  than  the  ESPM  and  minimum  half  factor  SPD  tests. 
The  JJS  test  fails  to  hold  its  level  for  this  skewed  distribution. 

4.  Summary  of  Change-Detection  Power  Comparisons 

Strengths  and  limitations  of  the  minimum  half  factor  and  kNN  statistics  come 
through  these  simulations.  The  minimum  half  factor  SPD  test  shows  power  to  detect 
change  across  all  14  simulated  scenarios,  at  or  near  the  power  of  the  ESPM  test  with  the 
notable  exceptions  of  the  multivariate  normal  and  heavy-tailed  scale  changes.  The 
composite  kNN  test  shows  superior  power  in  scale  change  cases,  but  trails  considerably 
in  all  other  cases. 
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IV.  GRAPH-THEORETIC  CHANGE-POINT  LOCALIZATION 


In  the  event  that  change  is  detected  in  the  sampling  distribution  of  a  series  of 
observations,  the  natural  question  is,  where  does  the  onset  of  change  occur?  In  this 
chapter,  we  propose  methods  for  change-point  localization  that  are  applied  if  the  null 
hypothesis  of  homogeneity  is  rejected.  We  limit  our  localization  efforts  to  a  search  for  a 
single  abrupt  change. 

A.  CHANGE-POINT  ESTIMATION  USING  EDGE  COUNT  DATA 

We  derive  change-localization  statistics  from  the  same  minimum  half  factor  and 
A'NN  subgraphs  that  detected  change  using  the  tests  described  in  Chapter  III.  These 
statistics  reveal  the  property  that,  when  moving  along  the  observation  sequence,  the  edges 
of  the  minimum  subgraph  exhibit  a  pattern  that  would  not  be  expected  under  the  null 
hypothesis.  By  counting  the  edges  going  to  or  coming  from  observations  with  a  lower 
index  value  than  the  current  observation,  we  can  localize  the  change  point  with 
reasonable  accuracy.  Recall  from  Chapter  III  Section  C,  a  prior  edge  is  an  edge  of  the 
minimum  subgraph  whose  destination  observation  label  is  less  than  the  source  (reference) 
observation  label. 

Under  the  alternative  hypothesis,  there  is  a  tendency  to  form  minimum  subgraph 
edges  among  observations  on  the  same  side  of  the  change  point  (Friedman  &  Rafsky, 
1979).  Observations  just  before  the  change  point  should  have  more  prior  edges  than 
observations  just  after  a  change  point.  In  a  minimum  subgraph,  observations  after  the 
change  point  are  more  likely  to  be  adjacent  to  observations  like  themselves  - 
observations  identified  with  a  higher  label  value.  For  a  single  abrupt  change,  we  expect 
observations  just  after  the  change  point  to  have  relatively  low  prior  edge  counts.  By 
finding  a  point  along  the  observation  sequence  where  the  prior  edge  count  drops  abruptly, 
we  estimate  the  change  point.  To  formalize  this  reasoning,  we  use  piecewise  regression 
based  on  the  prior  edge  counts. 
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1.  Statistical  Properties 

We  use  Li  to  denote  the  prior  edge  count  for  observation  i  based  on  the  minimum 
half  factor  or  the  outgoing  edges  of  the  A'NN  subgraph: 


j  1,  if  edge  (/,  /)  is  in  the  minimum  subgraph  /  ,\ 

,J  {' 0 ,  otherwise  ^ 

7=1 


(46) 


Note  that  Ln  =  Nf 2. 


We  use  (47)  to  obtain  the  prior  incoming  edge  count,  L' .  The  irregular  indegree 

of  the  A'NN  graph  edges  requires  different  treatment  for  the  prior  incoming  edges  than  the 
prior  outgoing  edges. 


j  1,  if  edge  ( j,i)  is  in  the  minimum  subgraph 
J‘  [O,  otherwise 

V=£^,.  Vie{2,...,AT}. 

7=1 


V(j,i)eE 


(47) 


Note  that  L'n  may  take  any  value  in  {0,  1,  1 }. 


Under  the  null  hypothesis,  L,  and  L'  have  a  hypergeometric  distribution 

(Wackerly,  Mendenhall,  &  Scheaffer,  2008).  For  the  minimum  half  factor  and  the 
outgoing  edges  of  the  kNN  subgraph,  the  probability  of  Li  taking  on  any  value  from  zero 
to  k  is  given  by 


Pr[i=/] 


(i-l) 

(  N-i  1 

{ l  J 

1 

1 

t  *  J 


where  N  —  1 :  population  size  (all  indices  except  itself) 

k  :  sample  size  (number  of  edges  incident  to  i) 
i  - 1 :  total  prior  edges  available 

l :  possible  values  for  the  prior  edge  count  jo,  1, . . . k } 
/</- 1 
k  -l  <  N  -i. 


(48) 


60 


We  know  the  null  hypothesis  expected  value  and  variance  of  the  prior  edge  count  for 
each  observation.  Later,  these  parameters  are  used  for  standardization: 


v[L]=k 


f 

V 

f 

V 


i- 1 ' 


r 

r  N-i' 

r  N-l-k} 

-u 

U-iJ 

{  N- 2  J 

(49) 


The  null  hypothesis  parameters  for  the  kNN  prior  incoming  edge  count  need  a 
slight  modification.  For  the  kNN  subgraph,  the  indegree  of  observation  i  (k')  may  take 

any  value  in  {0,  1,  . ..,  N -  1}.  The  conditional  distribution  of  L' ,  given  the  total  number 
of  incoming  edges  for  vertex  i,  is  hypergeometric: 


1 1,  if  edge  ( /  ,/)  is  in  the  subgraph  /  .  .\ 

In  „ti _ ’  / 


X  = 

10,  otherwise 


Pr 


1  N  1 

L'=l 

l 

II 

K 

[XI 

\j=hj*i  J 

0-0 


V  '  ) 


N-i 

N-l-l 


A 


vfkK 


(50) 


r 

(N-i] 

1 

1 

J 

-u 

U-iJ 

l  ^v-2  ) 

For  all  subgraphs,  the  hypergeometric  random  variables  describing  the  prior  edge 
count  have  the  indicated  expected  value  and  variance  under  the  null  hypothesis,  but  the 
random  variables  are  not  independent. 

We  search  for  a  change  point  by  optimizing  the  fit  of  a  two-piece  regression  line 
to  the  standardized  prior  edge  counts.  To  avoid  endpoint  effects,  we  do  not  consider 
indices  below  some  To  and  above  N  -  To.  Each  of  the  remaining  observations  is 
interrogated  as  a  possible  break  point,  and  the  break  point  providing  the  best  fit  for  the 
piecewise  regression  based  on  minimizing  the  residual  mean  squared  error  (MSE)  is 
chosen. 


61 


Piecewise  regression  may  allow  discontinuities  in  the  value  of  the  conditional 
mean  of  the  dependent  variable  or  vector,  and/or  only  allow  discontinuities  in  the  slope  of 
the  regression  line  (Koul  &  Qian,  2002).  For  our  application,  we  choose  a  model  that 
allows  discontinuity  continuity  in  the  value  of  the  conditional  mean  of  the  dependent 
variable.  Our  algorithm  is  a  variation  of  one  from  Crawley  (2007).  Our  localization 
method  is  specific  to  estimating  a  single  change  point. 

Our  regression  is  of  the  following  form: 

i :  observation  index,  i  e  {2,. . ., /v} 

T  :  inteiTogation  index,  r  e  {to,.  ..,N  —  to|; 

usually  T()  =  |^7V/l0j ,  where  |^aJ  is  the  largest  integer  <  A 
Z  :  standardized  prior  edge  count  for  index  i. 

For  each  re  {r0,K  ,/V-r0}  we  calculate: 


MSE(t)  =  X(z,.  -Kr  -Aj-A  >  A- hr U{i  >  A)2 

i= 2 


(52) 


Let  t  denote  the  index  value  that  minimizes  the  MSE(r).  We  take  f  as  our  estimator  of 
the  change  point. 

We  have  two  available  estimators  from  the  kNN  subgraph,  namely  the  prior 
incoming  edges  estimate  fIN  and  the  prior  outgoing  edges  estimate  f0UT .  We  take  f  to 

be  fIN  or  f0UT  depending  on  which  produces  the  smaller  value  of 


yggJL)  and  ygggrhoJ 

SSTm  SSTom  ' 

N  2  N 

SST]S  =  I(Z;„N  -Z|N)  ,  SSTom  =  I(z„OUT  -  Z0UT 

(=2  i=2 


(53) 


2.  A  Change-Point  Estimation  Procedure 

Including  the  change-detection  process,  we  proceed  along  these  steps: 
(1)  Extract  the  applicable  subgraph  (minimum  half  factor  or  kNN). 
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(2)  Use  the  procedures  of  Chapter  III  to  test  the  hypothesis  of  no  change. 

(3)  If  the  no-change  hypothesis  is  not  rejected,  stop  here  and  do  NOT  conduct 
change-point  localization. 

(4)  If  the  null  hypothesis  is  rejected,  obtain  the  change-point  estimator  f 
using  the  process  described  above. 

B.  CHANGE-POINT  LOCALIZATION:  ILLUSTRATIONS 

Returning  to  our  example  data  sets  from  Chapter  II  Section  A,  Figures  12  and  13 
provide  the  minimum  half  factor  and  10-NN  subgraphs,  respectively,  for  the  no-change 
case  and  the  change  case.  In  both  figures,  the  no-change  case  is  to  the  left  and  the  change 
case  is  to  the  right.  Observations  are  listed  along  the  bottom  and  top  of  each  figure  as  the 
reference  vertices  and  matching  vertices,  respectively.  The  prior  edges  for  each  reference 
vertex  are  red;  other  minimum  subgraph  edges  are  grey.  Figure  13  places  the  source 
observations  as  the  reference  vertices  to  highlight  /cNN  prior  outgoing  edges.  The  same 
figure  inverted  would  highlight  A'NN  prior  incoming  edges;  these  edges  are  grey  in 
Figure  13. 

As  seen  in  the  plots  on  the  left  of  Figures  12  and  13,  in  the  no-change  case,  some 
prior  edges  cross  from  higher  reference  vertices  to  lower  matching  vertices.  Red  is  seen 
throughout  the  plots.  From  the  plots  to  the  right,  it  is  evident  that  not  as  many  prior  edges 
cross  from  higher  reference  vertices  to  lower  matching  vertices  in  the  change  case  as  in 
the  no-change  case.  Figure  12  demonstrates  that  the  minimum  half  factor  is  a  10-regular 
subgraph,  while  Figure  13  shows  that  the  vertices  in  the  A'NN  subgraph  have  constant 
outdegree  and  fluctuating  indegree. 
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Matching  Vertex 


Matching  Vertex 


7  18  19  20 


8  9  10  11  12  13  14  15  16  17  18  19  20  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20 

Reference  Vertex  Reference  Vertex 

Figure  12.  Minimum  half  factor  for  the  example  data  sets.  Prior  edges  of 
reference  vertices  are  shown  in  red.  The  no-change  case  is  to  the  left, 
the  change  case  is  to  the  right. 


Figure  13.  The  10-nearest  neighbors  for  the  example  data  sets.  Prior  outgoing 
edges  of  reference  vertices  are  shown  in  red.  The  no-change  case  is  to 
the  left,  the  change  case  is  to  the  right. 


Table  23  amplifies  Figure  13,  showing  the  10-NN  of  each  vertex  as  well  as  the 
prior  outgoing  edge  count,  its  expected  value,  standard  deviation,  and  standardized  value, 
for  the  no-change  case  to  the  left  and  the  change  case  to  the  right.  The  standardized  prior 
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edge  count  on  the  left  of  Table  23  shows  no  discernible  pattern  for  the  no-change  sample. 
As  seen  on  the  right  of  Table  23,  the  standardized  prior  edge  count  (zd  for  the  change 
case  shows  a  clear  pattern.  All  values  of  z.i,  i<  11  are  positive,  while  almost  all  values  of 
Zi,  i  >  10  are  negative.  Analogous  tables  for  minimum  half  factor  and  prior  incoming  edge 
count  are  given  in  Appendix  C. 


Table  23.  The  10-nearest  neighbors  prior  outgoing  edge  count  (/,),  expected 
value,  standard  deviation,  and  standardized  value  for  example  data 
sets.  The  no-change  case  is  to  the  left,  the  change  case  is  to  the 

right. 

,  Change  in  Mean  Vector  from  (1,  0)  to 

Standard  Bivariate  Normal  °  .  . 

(3,  0)  at  Observation  i=  11 


Vertex  i 

li 

V 

a 

Zi 

1 

0 

0.00 

0.00 

— 

2 

0 

0.53 

0.50 

-1.05 

3 

2 

1.05 

0.69 

1.38 

4 

1 

1.58 

0.82 

-0.71 

5 

4 

2.11 

0.91 

2.08 

6 

5 

2.63 

0.98 

2.41 

7 

4 

3.16 

1.04 

0.81 

8 

2 

3.68 

1.08 

-1.56 

9 

3 

4.21 

1.10 

-1.10 

10 

4 

4.74 

1.12 

-0.66 

11 

7 

5.26 

1.12 

1.56 

12 

5 

5.79 

1.10 

-0.72 

13 

6 

6.32 

1.08 

-0.29 

14 

7 

6.84 

1.04 

0.15 

15 

7 

7.37 

0.98 

-0.37 

16 

9 

7.89 

0.91 

1.21 

17 

9 

8.42 

0.82 

0.71 

18 

10 

8.95 

0.69 

1.53 

19 

10 

9.47 

0.50 

1.05 

20 

10 

10.00 

0.00 

— 

Vertex  i 

h 

V 

a 

Zi 

1 

0 

0.00 

0.00 

— 

2 

1 

0.53 

0.50 

0.95 

3 

2 

1.05 

0.69 

1.38 

4 

3 

1.58 

0.82 

1.74 

5 

4 

2.11 

0.91 

2.08 

6 

5 

2.63 

0.98 

2.41 

7 

6 

3.16 

1.04 

2.73 

8 

5 

3.68 

1.08 

1.22 

9 

5 

4.21 

1.10 

0.72 

10 

9 

4.74 

1.12 

3.82 

11 

2 

5.26 

1.12 

-2.92 

12 

2 

5.79 

1.10 

-3.43 

13 

5 

6.32 

1.08 

-1.22 

14 

5 

6.84 

1.04 

-1.77 

15 

6 

7.37 

0.98 

-1.39 

16 

8 

7.89 

0.91 

0.12 

17 

8 

8.42 

0.82 

-0.52 

18 

8 

8.95 

0.69 

-1.38 

19 

10 

9.47 

0.50 

1.05 

20 

10 

10.00 

0.00 

— 

As  shown  previously,  using  the  tests  from  Chapter  III,  we  reject  the  null 
hypothesis  for  the  change  case.  Both  the  prior  incoming  and  prior  outgoing  edge  count 
regressions  result  in  f  =  11,  as  shown  in  Figure  14. 
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Observation 

Figure  14.  Standardized  count  of  nearest-neighbor  prior  edges  and  optimized 
piecewise  regression  lines,  for  the  10-nearest  neighbor  subgraph  formed 
from  the  example  data  sets.  The  prior  outgoing  edge  counts  are  in 
green,  the  prior  incoming  edge  counts  are  in  blue. 


Next,  we  illustrate  our  proposed  change  localization  procedure  on  a  larger  data 
set.  Starting  with  N  =  200  five-dimensional  multivariate  normal  simulated  observations, 
we  insert  an  abrupt  jump  in  the  first  component  of  the  mean  vector  at  observation  r  =  101 
by  adding  0.5  to  those  values.  Figure  15  shows  the  changed  component  of  our  data, 
plotted  by  observation  label. 
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Figure  15.  The  changed  component  of  our  five-dimensional  example  data, 
plotted  against  observation  label  (N  =  200).  Data  are  randomly 
generated  from  a  multivariate  normal  distribution.  The  first  component 
of  the  mean  vector  is  the  dotted  black  line,  showing  the  inserted  change 
of  magnitude  0.5  at  observation  101.  The  remaining  four  dimensions 

are  unchanged. 

Using  these  observations,  we  obtain  the  minimum  half  factor  and  calculate  the 
minimum  half  factor  sum  of  pair-differences  (SPD).  The  standardized  SPD  statistic  for 
these  data  is  -2.97,  which  is  less  than  the  critical  value  -2.81  based  on  the  unimodal 
bound  in  (44)  of  Chapter  III  Section  D.  By  permuting  the  observation  labels  10,000  times 
we  obtain  a  p-value  of  0.0075  leading  to  a  rejection  of  the  null  hypothesis  of 
homogeneity  at  the  a  =  0.05  test  level,  and  thus  conclude  that  the  observations  exhibit  a 
change  in  the  sampling  distribution. 

Now  for  all  indices  /e{2,...,/v}  we  determine  the  prior  edge  counts  /,  as 

described  in  (46).  For  example,  observation  15  has  its  100  edges  connecting  to  the 
following  indices  that  are  less  than  15  {2,  3,  4,  5,  6,  7,  8,  11,  13},  so  /i5  =  9.  Under  the 
null  hypothesis,  FI/15]  =7.04  and  SD[/i5]  =  1.81,  which  gives  a  standardized  count  of 
Zi5=  1.09  (54). 
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Mean  Jump  of  0.5  at  Observation  101 
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Using  the  standardized  counts,  we  search  for  the  optimal  point  to  break  a  piecewise 

regression.  We  iterate  through  the  indices  /  =  {20,  21, _ ,  180},  splitting  the  data  into 

two  subsets  just  past  each  considered  index.  We  fit  the  linear  regression  as  described  in 
(51)  and  (52).  The  index  that  minimizes  the  MSE  is  our  change-point  estimate. 

Figure  16  shows  the  optimized,  least-squares  piecewise  regression  lines  overlaid 
on  the  standardized  prior  edge  count  for  which  the  change  point  is  estimated  at  t  =  1 1 1. 


Figure  16.  Standardized  count  of  prior  edges  in  blue,  with  the  optimized 

piecewise  regression  lines  in  green,  for  the  minimum  half  factor  formed 
from  the  larger  data  set.  The  standardized  count  of  prior  edges  for 
observation  15  is  in  magenta  (zi5  =  1.09) 
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The  slopes  of  the  fitted  regression  lines  show  that  for  indices  less  than  the 
proposed  change  point,  prior  edges  were  more  likely  than  to  be  expected  under  the  null 
hypothesis  (z  >  0).  Just  after  the  proposed  change  point,  prior  edges  were  less  likely  than 
to  be  expected  under  the  null  hypothesis  (z  <  0).  Recall  that  our  example  is  for  a  mean 
jump  of  0.5  in  the  first  of  five  dimensions  at  r  =  101.  Finally,  Figure  17  shows  the 
estimated  change  point  plotted  with  the  changed  component  of  our  data. 
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Figure  17.  The  estimated  change  point  and  the  changed  component  of  our  five¬ 
dimensional  example  data  plotted  against  observation  label.  The  first 
component  of  the  mean  vector  is  the  dotted  black  line,  showing  the 
inserted  change  of  magnitude  0.5  at  observation  101.  The  estimated 
change  point  is  the  red  line.  The  remaining  four  dimensions  are 

unchanged. 
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In  our  evaluation  of  change-point  estimators,  we  regard  estimator  f  as  close  to 
the  truth  if  falls  in  a  narrow  window  centered  on  the  truth  with  a  width  0.05N.  For 
N  =  200,  the  window  width  is  10,  so  the  estimator  must  fall  within  T±  5.  Despite  change 
being  detected  and  localized,  the  estimated  change  point  of  f  =111  fails  this  test  and  is 
judged  not  to  be  close  to  the  actual  change  point  of  T=  101. 
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C.  LOCATION  ESTIMATOR  COMPARISONS 

We  use  a  series  of  simulations  to  evaluate  the  location  estimators.  The  simulations 
expose  the  estimators  to  different  distributions,  change  types,  and  change  magnitudes. 
Location  estimator  accuracy  is  assessed  in  two  ways.  We  primarily  report  a  success  rate, 
which  we  define  as  the  fraction  of  times  the  change  point  was  localized  within  the  0.05A- 
wide  window  centered  on  true  change  point.  Localization  occurs  only  if  change  is 
detected  -  if  no  change  is  detected,  that  is  counted  as  a  localization  failure.  We  also 
report  the  interquartile  range  (IQR)  of  the  estimated  change  points. 

We  test  our  location  estimators  under  a  variety  of  scenarios,  similar  to  the 
scenarios  considered  in  Chapter  III.  For  our  location  estimator  simulations,  we  simulate 
five-dimensional  observations.  We  place  an  abrupt  (jump)  change  in  the  middle  of  the 
observation  sequence.  We  conduct  1,000  simulations  per  scenario,  and  display  the 
success  rate  as  the  fraction  of  times  out  of  1,000  that  the  change  was  detected  and 
localized  within  plus  or  minus  five  indices  of  the  actual  change  point.  Tables  24  through 
26  present  the  results  of  a  simulation  study  of  success  rates  and  IQRs  for  change-point 
estimations  using  the  minimum  half  factor  prior  edge  count  (HF)  and  the  best  of  the  &NN 
prior  incoming  edge  count  and  the  A'NN  prior  outgoing  edge  count  as  described  in  (53). 
Each  table  identifies  the  distribution  (  FMVN  ,  Fmjx ,  FWeib ) ,  the  parameter  (mean,  scale) 

under  change,  and  the  change  magnitudes  (0.25,  0.50,  0.75,  and  1.00).  A  sample  size  of 
A  =200  and  actual  change  point  of  r  =  101  is  used  in  all  vignettes.  For  changes  in 
distribution  mean,  and  for  the  change  in  the  scale  of  the  multivariate  Weibull  distribution, 
we  simulate  the  change  in  only  the  first  component  of  each  observation.  For  scale 
changes  in  the  multivariate  normal  and  multivariate  normal  mixture  distributions,  we 
simulate  the  change  in  all  components. 
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1.  Multivariate  Normal  Vignettes 

Table  24  shows  the  location  estimator  success  rate  and  IQR  for  the  multivariate 
normal  cases. 


Table  24.  Change-point  localization  success  rate  and  IQR  for  the  five¬ 
dimensional  multivariate  normal  simulated  observations  with 
1,000  simulations  per  vignette.  A  change  of  magnitude  A  is 
inserted  at  observation  r=  101  of  N  =  200.  Mean  vector  change  is 
to  the  right;  scale  change  is  to  the  left.  Success  rate  is  defined  as 
the  fraction  of  times  the  change  is  detected  at  a  0.05  test  level  and 
change-point  estimator  equal  to  I  ±  5.  IQR  is  in  parentheses. 


Tmvn;  p  —  5 
A 
0.25 
0.50 
0.75 
1.00 


Mean  Jump 

HF 

kNN 

0.02  (55) 

0.01  (71) 

0.17(25) 

0.08  (40) 

0.52  (8) 

0.38  (10) 

0.76  (3) 

0.73  (4) 

Scale  Jump 

HF 

£NN 

0.04  (65) 

0.23  (28) 

0.45  (10) 

0.58(11) 

0.80  (3) 

0.85  (2) 

0.94  (1) 

0.96  (1) 

HF:  SPD  test  and  edges  from  the  minimum  Half  Factor 
A'NN:  Composite  test  and  edges  from  the  ^-Nearest  Neighbors 


For  each  of  the  two  change-point  estimators,  in  both  the  location  and  scale  change 
vignettes,  the  success  rate  increases  and  the  IQR  narrows  as  the  change  magnitude 
increases.  For  the  location  change  vignettes,  the  minimum  half  factor  success  rate 
exceeds  that  of  the  A'NN  estimator.  In  the  multivariate  normal  scale  change  scenarios,  the 
success  rate  of  the  A'NN  estimators  exceeds  that  of  the  minimum  half  factor.  Both 
distinctions  narrow  as  the  change  magnitude  increases. 
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2.  Multivariate  Normal  Mixture  Vignettes 

Table  25  shows  the  location  estimator  success  rate  and  IQR  for  the  multivariate 
normal  mixture  cases. 


Table  25.  Change-point  localization  success  rate  and  IQR  for  the  five¬ 
dimensional  multivariate  normal  mixture  simulated  observations 
with  1,000  simulations  per  vignette.  A  change  of  magnitude  A  is 
inserted  at  observation  r=  101  of  N  =  200.  Mean  vector  change  is 
to  the  right;  scale  change  is  to  the  left.  Success  rate  is  defined  as 
the  fraction  of  times  the  change  is  detected  at  a  0.05  test  level  and 
change-point  estimator  equal  to  T  ±  5.  IQR  is  in  parentheses. 


Fmi\t  P  —  5 

A 

0.25 

0.50 

0.75 

1.00 


Mean  Jump 

HF 

kNN 

0.01  (70) 

0.01  (69) 

0.11  (36) 

0.03  (57) 

0.34(16) 

0.15(22) 

0.59  (7) 

0.42  (9) 

Scale  Jump 

HF 

£NN 

0.05  (48) 

0.13(42) 

0.36(14) 

0.35  (20) 

0.70  (5) 

0.58  (12) 

0.85  (3) 

0.74  (4) 

HF:  SPD  test  and  edges  from  the  minimum  Half  Factor 
A'NN:  Composite  test  and  edges  from  the  ^-Nearest  Neighbors 


Similar  to  the  multivariate  normal  location  change  case,  the  minimum  half  factor 
success  rate  consistently  exceeds  the  success  rate  of  the  /cNN  estimator  in  the  multivariate 
normal  mixture  location  change  case.  The  results  are  not  as  pronounced  for  the 
multivariate  normal  mixture  scale  change  case;  the  minimum  half  factor  estimator 
success  rate  exceeds  that  of  the  A'NN  estimator  for  the  larger  change  magnitudes 
examined. 
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3.  Multivariate  Weibull  Vignettes 

Table  26  shows  the  location  estimator  success  rate  and  IQR  power  for  the 
multivariate  Weibull  cases. 


Table  26.  Change-point  localization  success  rate  and  IQR  for  the  five¬ 
dimensional  multivariate  Weibull  simulated  observations  with 
1,000  simulations  per  vignette.  A  change  of  magnitude  A  is 
inserted  in  the  scale  parameter  /3  at  observation  r=  101  of  N  =  200. 
Success  rate  is  defined  as  the  fraction  of  times  the  change  is 
detected  at  a  0.05  test  level  and  change-point  estimator  equal  to 
T  ±  5.  IQR  is  in  parentheses. 

Tweib;  P  =  5 

A 
0.25 
0.50 
0.75 
1.00 


HF:  SPD  test  and  edges  from  the  minimum  Half  Factor 
A'NN:  Composite  test  and  edges  from  the  ^-Nearest  Neighbors 


(5  Jump 

HF 

&NN 

0.05  (53) 

0.02  (56) 

0.28(18) 

0.21  (22) 

0.56  (8) 

0.51  (9) 

0.71  (6) 

0.69  (5) 

The  minimum  half  factor  achieves  more  success  than  the  &NN  edges  for  the 
multivariate  Weibull  with  a  scale  parameter  change,  although  it  is  a  close  distinction. 

4.  Summary  of  Change-Point  Estimator  Comparisons 

For  location  changes,  estimators  based  on  prior  edge  counts  derived  from  the 
minimum  half  factor  have  a  higher  success  rate  than  those  of  the  &NN,  but  for  scale 
change,  the  success  rate  for  A'NN  prior  edge  counts  is  generally  higher.  We  come  to  a 
similar  conclusion  with  larger  window  sizes  than  the  0.05  V  width  tabulated  here. 
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V.  CHANGE  ANALYSIS  TECHNIQUES:  AN  APPLICATION 


In  this  chapter,  we  illustrate  our  proposed  change  detection  and  localization 
methods  by  applying  them  to  the  fire  detection  test  data  of  Gottuk  et  al.  (2002).  Our 
graph-theoretic  methods  assume  data  independence,  which  may  be  violated  in  data 
collected  from  sensors  over  a  period  of  time.  We  show  how  our  methods  can  be  applied 
to  overcome  this  situation.  We  also  demonstrate  a  modification  that  can  be  employed  to 
reduce  the  sensitivity  of  our  statistical  tests  to  signals  that  should  not  be  regarded  as 
requiring  detection.  Rather  than  the  offline  change  detection  for  which  our  methods  are 
designed,  fire  detection  requires  online  change  analysis.  Accordingly,  we  implement  our 
analysis  in  a  near  real-time  manner. 

A.  PROBLEM  DESCRIPTION 

In  this  section,  we  describe  the  data  collected  and  three  challenges  they  present. 
We  examine  a  multivariate  data  set  from  one  of  a  series  of  24  tests  as  described  in  Gottuk 
et  al.  (2002)  and  JiJi  et  al.  (2003).  These  tests  were  conducted  by  the  Naval  Research 
Laboratory  (NRL)  onboard  a  decommissioned  U.S.  Naval  vessel,  ex-USS  SHADWELL. 
The  tests  were  part  of  a  larger  effort  to  “develop  an  early  warning  fire  detection  (EWFD) 
system  that  is  highly  immune  to  nuisance  alarms”  (Gottuk  et  al.,  2002,  p.  1).  The  goal  of 
this  series  of  tests  was  to  determine  the  behavior  of  a  real-time  integrated  alarm  system 
under  stimuli  resulting  from  combustion.  Each  of  the  24  tests  feature  a  combustion  event 
involved  a  different  pairing  of  shipboard  location  and  combustible  material.  NRL 
analyzed  the  data  from  each  test  with  a  probabilistic  neural  network  algorithm.  The 
algorithm  was  trained  with  data  from  more  than  40  combustion  events  conducted 
previous  to  the  24  tests  (JiJi  et  al.,  2003).  Although  we  prepare  the  data  as  described  in 
Section  B  of  this  chapter,  our  methods  do  not  rely  on  training  data. 

1.  Sensor  Locations  and  Types 

Gottuk  et  al.  (2002)  positioned  14  clusters  of  fire  detection  sensors  on  the  second 

and  third  decks  (floors)  covering  12  areas  in  the  forward  section  of  the  decommissioned 

ship.  Figures  18  and  19  show  the  sensor  cluster  positions.  The  deck  numbering  is  from 
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In  each  of  the  14  clusters,  four  different  sensors  were  employed:  an  ionization 
smoke  detector,  a  photoelectric  smoke  detector,  a  carbon  monoxide  detector,  and  a 
carbon  dioxide  detector.  The  sensors  remain  in  continuous  operation  throughout  the  test 
series.  We  limit  consideration  to  the  ionization  and  photoelectric  smoke  detectors.  During 

the  control  period,  the  carbon  dioxide  sensors  show  non-stationarity,  and  the  carbon 
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monoxide  sensors  show  spiking  from  possible  radio  interference  (see  Appendix  D).  The 
ionization  and  photoelectric  sensors  were  mounted  overhead,  2.9  m  above  the  deck. 

2.  Test  Stages  and  Sensor  Response 

In  each  of  the  24  alarm  system  tests,  sensor  data  were  recorded  at  two-second 
intervals.  A  notional  test  timeline  is  shown  in  Figure  20.  Data  recording  began  at  the  start 
of  a  pre-stimulus  period  of  several  minutes  prior  to  the  test  start  time.  Source  ignition 
served  as  a  time  reference  point  for  a  given  test.  The  gap  between  the  start  of  the  test  and 
event  initiation  ranged  from  five  to  40  minutes.  Ten  to  75  minutes  after  event  initiation, 
after  alarms  would  have  been  expected  to  trigger,  the  investigators  applied  ventilation  to 
clear  atmospheric  contaminants  and  establish  conditions  for  the  next  test.  Data  collection 
for  a  test  ended  after  a  few  minutes  of  ventilation. 
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Figure  20.  Notional  stages  for  fire  tests  onboard  ex-USS  SHADWELL. 


The  promptness  and  level  of  sensor  response  vary  with  the  concentration  of 
airborne  products  from  a  combustion  source.  A  major  fire  or  smoke  event  quickly 
produces  large  changes  in  combustion  sensor  measurements.  Smoldering  events  produce 
responses  that  are  more  subtle  and  slower  to  develop.  Figure  21  illustrates  the  sensor 
responses  to  combustion  stimuli  in  two  different  test  scenarios.  Depicted  on  the  left  are 
data  from  the  two  sensors  at  EWFD  Location  3,  the  closest  location  to  a  second  deck 
F-76  diesel  fuel-spill  fire.  On  the  right,  we  plot  data  from  the  two  sensors  at  EWFD 
Location  10,  the  closest  location  to  a  third  deck  smoldering  computer  monitor.  We 
standardize  these  measurements  using  the  sample  means  and  standard  deviations  of  pre¬ 
stimulus  data  (data  collected  prior  to  the  start  of  the  test).  For  the  diesel  fuel  fire  event, 

rapid  change  is  evident  in  each  of  the  standardized  measurements  within  seconds  of  the 
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fuel  fire  initiation.  For  the  smoldering  computer  monitor  event,  the  sensor  responses  are 
less  immediate. 


Figure  21.  Sensor  response  to  combustion  events.  To  the  left,  standardized 
sensor  data  from  Location  3,  closest  to  an  F-76  fuel  spill  fire.  To  the 
right,  standardized  sensor  data  from  Location  10,  closest  to  a 
smoldering  computer  monitor.  Sensors  are  ionization  (top)  and 
photoelectric  (bottom)  smoke  detectors.  The  horizontal  axes  mark  the 
time  from  event  initiation  in  seconds. 


Both  ionization  and  photoelectric  sensors  use  gas  diffusion  into  the  unit  to 
produce  measurements  (JiJi  et  al.,  2003).  The  ionization  sensor  contains  a  radioactive 
isotope,  Americium-241,  that  produces  ionization  in  the  air.  Under  normal  conditions,  the 
ionized  air  allows  current  to  flow  between  two  electrodes  in  body  of  the  sensors.  The 
current  is  continuously  compared  to  the  current  between  electrodes  in  a  sealed  reference 
chamber.  Under  fire  conditions,  ions  attach  to  smoke  particles  entering  the  sensor,  and 
the  current  drops  below  the  reference  current.  The  photoelectric  sensor  contains  a  light 
source  and  an  offset  photodiode  receiver.  Under  normal  conditions,  light  from  the  source 
does  not  reach  the  receiver.  Under  fire  conditions,  smoke  particles  entering  the  sensor 
interact  with  the  light,  deflecting  some  light  to  the  photodiode  receiver.  Both  sensors 
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report  their  respective  measures  in  units  of  percent  obscuration  per  unit  length  (NFPA, 
2008). 

In  the  more  than  1,000  observations  during  the  control  periods  of  both  tests  of 
Figure  21,  the  photoelectric  sensors  only  take  values  in  the  set  {-0.25,  -0.24,  ...,  0.27}. 
Ties  may  be  present  when  solving  for  the  minimum  subgraphs  among  observations  in  the 
absence  of  a  stimulus,  however,  Figure  21  shows  that  the  photoelectric  sensor  response  in 
the  presence  of  a  combustion  event  is  well  above  the  control-period  values. 

We  demonstrate  our  graph-theoretic  methods  in  the  test  case  of  a  smoldering 
computer  monitor,  which  proved  difficult  to  detect  in  the  analysis  reported  by  Gottuk  et 
al.  (2002).  The  smoldering  computer  monitor  was  located  in  the  third  deck  engineering 
storeroom  (see  Figure  19).  We  consider  data  only  from  the  five  third-deck  EWFD 
Locations  (10,  11,  12,  13,  and  14),  of  which  Locations  10  and  11  were  closest  to  the 
combustion  source.  Considering  each  location  separately,  there  are  five  two-dimensional 
data  streams. 


3.  Challenges  in  Applying  Graph- Theoretic  Change  Detection 
Techniques 

Although  having  pre-stimulus  “control”  data  is  not  strictly  necessary  for  applying 
our  methods,  it  is  useful  in  the  present  situation  for  two  reasons.  First,  it  provides  a  useful 
baseline  for  standardization.  Second,  it  allows  us  to  examine  if,  and  how,  the  sensor 
measurements  are  correlated  with  respect  to  time.  Collected  prior  to  the  test  start  time, 
these  observations  are  assumed  to  be  representative  of  the  background  environment. 
From  them,  we  observe  the  characteristics  of  the  sensors  without  contamination  from 
outside  stimuli.  We  assume  that  the  pre-stimulus  data  are  free  from  any  initial  effects, 
e.g.,  we  expect  any  instrumentation  bum-in  to  be  complete. 

In  our  evaluation  of  the  pre-stimulus  observations,  we  find  evidence  of 
dependency  in  the  form  of  autocorrelation.  Autocorrelation  is  the  correlation  of  a  series 
of  measurements  at  one  point  in  time  with  the  same  series  at  a  different  point  in  time.  An 
echo  is  an  example  of  autocorrelation.  Time  series  data,  such  as  the  sensor  data  we 
consider  in  the  present  example,  are  frequently  characterized  by  dependencies  such  as 
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autocorrelation  (Box  &  Jenkins,  1976).  It  is  important  to  remove  autocorrelation  because 
our  minimum  subgraph  algorithms  rely  on  independence  in  the  data.  Under  the  null 
hypothesis,  we  assume  that  any  observation  has  an  equal  likelihood  of  being  adjacent  to 
any  other  observation  in  the  subgraph.  If  autocorrelation  is  not  removed,  observations 
would  be  more  likely  to  match  to  their  immediate  neighbors  in  the  measurement 
sequence. 

We  assume  that  the  pre-stimulus  observations  from  each  individual  sensor  form  a 
second-order  stationary  time  series.  By  this,  we  assert  that  the  mean  (p),  variance  (<72), 
and  autocorrelations  are  constant  for  each  sensor.  We  express  the  autocorrelation  (p  s)  of 
a  second-order  stationary  time  series  Y„  t=  1,  2,  ...,  at  a  particular  lag  s,  s  =  1,  2,  ...,  in 
terms  of  the  mean  and  variance  of  the  time  series  (Box  &  Jenkins,  1976): 

<7 

The  numerator  and  denominator  of  (55)  are  the  autocovariance  of  the  time  series  at  lag  .s’ 
and  the  variance  of  the  time  series,  respectively. 

An  autoregressive  (AR)  model  expresses  simultaneous  autocorrelation  of  more 
than  one  lag  value.  Such  a  model,  fit  to  our  data,  could  provide  the  means  to  remove 
autocorrelation,  if  present.  We  use  the  autocorrelation  function  (ACF)  and  the  partial 
autocorrelation  function  (PACF)  to  detect  autocorrelation  and  estimate  parameters  of  the 
AR  model. 

We  estimate  the  autocorrelation  for  different  lag  values,  and  plot  the  resulting 
ACF,  ps,  as  a  function  of  the  lag  value  5.  The  acf  function  in  R  produces  an  ACF  plot, 

and  displays  95  percent  probability  bands  assuming  that  the  autocorrelation  at  a  particular 
lag  value  is  equal  to  zero.  An  ACF  value  outside  the  bands  indicates  autocorrelation.  We 
do  not  know  the  population  mean  and  variance  and  must  estimate  them  from  our  data 
(Box  &  Jenkins,  1976): 
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(56) 


The  PACF  is  a  correlation  conditioned  on  the  previous  observations,  and  is 
determined  by  solving  what  are  known  as  the  Yule-Walker  equations  (Box  &  Jenkins, 
1976).  The  pacf  function  in  R  displays  95  percent  probability  bands  assuming  that  the 
partial  autocorrelation  at  a  particular  lag  value  is  equal  to  zero.  An  AR  model  for  lag 
values  {1,  ...,p}  has  the  following  representation  (Box  &  Jenkins,  1976): 


p 

Y,~M  =  lts{Yt-s-p)  +  ^  £v-’£n  are  iidiV (o, cr ) . 


(57) 


If  the  true  AR  model  order  is  p,  the  PACF  is  0  at  lag  s,  for  s  =  p  +  1,  p  +  2,  .... 

The  change  detection  methods  presented  in  this  dissertation  are  designed  to  detect 
any  type  of  change,  without  regard  to  whether  the  signal  is  large  or  small.  The  goal  of  fire 
detection  is  to  detect  effects  that  are  sufficiently  large  to  warrant  the  activation  of  an 
alarm.  Low-level  stimuli,  for  example  due  to  a  person  with  a  lit  cigarette  entering  a  room 
where  a  smoke  detector  has  been  placed,  should  not  generate  an  alarm.  Triggering  false 
alarms  can  lead  to  complacency: 

Consider  the  consequences  of  a  high  [smoke  detector]  false  alarm  rate  on 
the  performance  of  the  subsequent  human  monitor.  A  busy  human  monitor 
may  soon  learn  to  ignore  the  smoke  detector’s  alarm  signal,  considering  it 
a  false  alarm  and  not  worthy  of  a  shift  in  attention  from  more  pressing 
duties.  The  performance  of  the  overall  smoke  detector-human  monitor 
system  would  be  worse  than  if  the  smoke  detector  were  set  to  emit  fewer 
alarms.  (Sorkin  &  Woods,  1985,  p.  52) 

Consequently,  an  adaptation  must  be  made  to  our  graph-theoretic  techniques  to  allow 
low-level  stimuli  to  not  signal  a  change. 

In  addition  to  adequately  handling  data  dependency  and  reducing  sensitivity  to 
low-level  change,  successful  employment  in  this  illustration  requires  our  methods  to 
respond  to  the  immediacy  of  a  fire  situation. 
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B.  DATA  PREPARATION 


In  this  section,  we  discuss  data  handling  techniques  to  overcome  the  three 
difficulties  presented  in  Section  A  of  this  chapter. 

1.  Diagnosing  and  Mitigating  Autocorrelation  of  Sensor  Measurements 

We  demonstrate  the  estimation  and  mitigation  of  autocorrelation  with  the 
Location  10  ionization  sensor.  Figure  22  depicts  EWFD  Location  10  ionization  sensor 
pre-stimulus  data,  for  which  there  are  551  observations.  Although  these  measurements 
fluctuate,  they  appear  to  track  the  immediately  prior  observation,  indicating  possible 
autocorrelation. 


1400  1200  1000  800  600  400 

Time  until  event  initiation  (seconds) 


Figure  22.  Location  10  ionization  sensor  pre-stimulus  data  for  the  smoldering 

computer  monitor  test  (551  observations). 

We  inspect  the  estimated  autocorrelation  function  (ACF)  and  partial 
autocorrelation  function  (PACF)  to  determine  the  type  of  time-series  dependence,  if  any, 

that  may  be  present.  If  dependence  is  indicated,  we  use  the  R  function  ar  to  estimate  the 
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order  and  the  parameters  of  an  AR  model.  The  order  is  chosen  by  minimizing  the  Akaike 
Information  Criterion  (AIC).  A  fuller  autoregressive-moving  average  (ARMA)  model 
(Box  &  Jenkins,  1976)  may  be  considered  if  the  order  of  the  AR  model  is  large.  We 
calculate  the  residuals  of  the  estimated  AR  model,  and  re-evaluate  the  estimated  ACF  and 
PACF  for  indications  of  time-series  dependence  not  removed  by  the  AR  model.  The 
process  of  extracting  residuals  from  a  time-series  model  that  has  little  or  no  indication  of 
time-series  dependence  among  them  is  called  whitening. 

We  again  turn  to  the  ionization  sensor  data  from  EWFD  Location  10  to  illustrate 
the  whitening  process.  Plots  of  the  ACF  and  PACF  applied  to  centered  pre-stimulus  data 
are  shown  in  Figure  23.  Autocorrelation  is  indicated  by  one  or  more  lag  values  of  the 
ACF  exceeding  the  95  percent  probability  bands  around  the  value  of  zero,  under  the  null 
hypothesis  that  there  is  no  autocorrelation.  The  PACF  plot  gives  indication  that  an  AR 
model  of  order  one  may  be  appropriate  to  remove  the  autocorrelation. 


ACF 


PACF 


Lag  Lag 

Figure  23.  Autocorrelation  plots  for  centered  pre-stimulus  data  from  the 

Location  10  ionization  sensor.  To  the  left,  the  estimated  autocorrelation 
function  plot;  to  the  right,  the  estimated  partial  autocorrelation  function 
plot.  The  blue  dashed  lines  indicate  the  95  percent  probability  bands  for 

the  absence  of  correlation. 
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Having  detected  autocorrelation,  we  use  the  ar  function  in  R  to  obtain  estimated 


AR  model  parameters  for  the  centered  Location  10  ionization  sensor  data: 

y,  =  0.9225  ( y,_,  -  y)  -  0.0499  ( y,_2  -  y) + 0.0057  ( -  y) 

-0.0758(  yt_4  -  y) + 0.0657  ( _y,_5  -  y )+ 0.0336  ( y,_6  -  y )  (58) 

+0.0568(yt_7  -  v) -0.2158 (  v,_8  -  ,v)  + 0.1  134()ji_9  -  v). 

We  apply  this  AR  model  to  the  centered  data,  and  then  test  the  whitened  data  for 
autocorrelation  to  examine  the  effectiveness  of  whitening.  As  shown  in  Figure  24,  the 
whitened  data  show  little  autocorrelation. 


ACF 


PACF 
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Figure  24.  Autocorrelation  plots  for  centered,  whitened  pre-stimulus  data  from 
the  Location  10  ionization  sensor.  To  the  left,  the  estimated 
autocorrelation  function  plot;  to  the  right,  the  estimated  partial 
autocorrelation  function  plot.  The  blue  dashed  lines  indicate  the  95 
percent  probability  bands  for  the  absence  of  correlation. 


Our  methods  assume  that  the  data  are  statistically  independent,  as  distances 
between  observations  form  the  basis  for  obtaining  the  minimum  half  factor  and  the 
^-nearest  neighbors  (TcNN)  subgraphs,  and  we  appear  to  have  achieved  that.  We  complete 
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the  transformation  by  dividing  the  centered,  whitened,  pre-stimulus  data  by  its  standard 
deviation  ((j) .  A  more  robust  measure,  such  as  the  IQR,  could  be  used. 

An  assumption  of  normality  is  not  required  by  our  methods.  An  examination  of 
the  normal  quantile-quantile  plot  in  Figure  25  shows  that  the  transformed  observations 
from  the  Location  10  ionization  sensor  do  not  severely  depart  from  normality.  The  AR 
model  we  estimate  for  the  Location  10  photoelectric  sensor  is  as  follows: 

yt=  0-0898  (yM-y).  (59) 

The  transformed  observations  from  the  Location  10  photoelectric  sensor  are  seen  to  be 
nearly-discretely  valued. 


-3-2-10123  -3-2-10123 

Theoretical  Quantiles  Theoretical  Quantiles 

Figure  25.  Normal  quantile-quantile  plots  for  centered,  whitened,  standardized 
pre- stimulus  data  from  the  Location  10  ionization  sensor  (left)  and  the 
Location  10  photoelectric  sensor  (right).  Discrete  values  of  the 
photoelectric  sensor  are  noted  in  Subsection  A. 2. 


To  describe  dependence  of  the  transformed  data  across  sensors,  we  produce  a 
scatterplot  of  the  Location  10  sensors  in  Figure  26.  There  is  no  apparent  correlation 
between  the  two  sensors.  This  is  interesting  because  it  appears  that  the  information 
provided  by  the  two  sensors  really  is  different.  What  causes  perturbations  in  one  sensor  is 
not  causing  perturbations  in  the  other,  and  vice  versa. 
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Transformed  Ionization  Measurements 

Figure  26.  Scatterplot  of  transformed  measurements  for  Location  10. 

We  check  the  transformed  data  for  correlation  across  the  two  sensors  at  each 
location  by  estimating  the  correlation  coefficients.  The  estimates  shown  in  Table  27 
indicate  relative  independence  between  the  two  sensors  at  each  location,  although  the  95- 
percent  confidence  interval  for  the  correlation  coefficient  does  not  include  zero  at 
Location  10.  The  indicated  correlation  between  the  Location  10  ionization  sensor  and  the 
Location  10  photoelectric  sensor  in  the  transformed  pre-stimulus  observations  is  weak 
and  does  not  require  mitigation. 


Table  27.  Estimated  value  and  95 -percent  confidence  interval  for  the 
correlation  coefficient  between  transformed  pre-stimulus 
observations  of  ionization  and  photoelectric  sensors  from  each 

location. 

Location  10  Location  11  Location  12  Location  13  Location  14 
Estimate  -0.090  -0.009  0.073  0.001  0.011 

95%  Cl  (-0.173,-0.006)  (-0.093,0.075)  (-0.011,0.156)  (-0.083,0.085)  (-0.073,0.095) 
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2. 


Detection  above  a  Threshold  Level 


To  prevent  unwanted  alarms  due  to  low-level  stimuli,  we  apply  an  upper 
threshold  value  to  the  sensor  data  using  the  following  process: 

(1)  The  pre-stimulus  observations  are  centered  and  whitened  as  described  in 
the  previous  section. 

(2)  Determine  a  threshold  value  such  that  values  below  the  threshold  are 
considered  not  to  justify  an  alarm.  In  practice,  this  value  should  reflect 
expert  judgment  about  the  strength  of  signal  considered  actionable  for  fire 
detection. 

(3)  Apply  a  locally  weighted  scatterplot  smoother  to  the  whitened  data.  We 
use  a  LOESS  smoother  with  span  equal  to  0.25  (Cleveland  &  Grosse, 
1990). 

(4)  Truncate  the  smoothed  data  by  taking  the  minimum  of  the  smoothed 
values  and  the  threshold  value. 

(5)  Produce  a  set  of  residuals  by  subtracting  the  truncated,  smoothed  data 
from  the  whitened  test  observations. 

We  call  these  residuals  the  thresholded  data  and  use  them  in  our  subsequent  analysis.  For 
the  sake  of  illustration,  we  use  the  upper  quartile  plus  1.5  times  the  interquartile  range 
(IQR)  as  a  threshold  value.  Our  threshold,  although  not  reflective  of  knowledge  of  fire 
detection,  is  often  used  for  statistical  outlier  detection  in  the  construction  of  box  plots 
(Devore,  2011).  As  with  whitening,  we  apply  thresholding  separately  for  each  sensor  and 
location.  For  comparison  purposes,  we  calculate  the  probability  that  an  observation 
sampled  from  the  standard  normal  distribution  exceeds  the  threshold  as  described  above. 
Standard  normal  data  are  expected  to  have  first  and  third  quartiles  of  -0.6745  and  0.6745, 
respectively.  The  expected  interquartile  range  is  then  1.349.  The  probability  of  a  standard 
normal  data  exceeding  the  threshold  value  of  2.698  is  0.00349  -  roughly  3.5  times  in  a 
thousand. 
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Figure  27  provides  a  graphical  illustration  of  thresholding,  using  the  Location  10 
ionization  sensor.  On  the  left  of  Figure  27,  we  show  the  whitened  pre-stimulus 
observations  in  purple,  the  resulting  threshold  in  black,  whitened  test  observations  in 
blue,  smoothed  data  in  orange,  and  the  truncated  data  in  dashed  red.  On  the  right  of 
Figure  27  is  the  thresholded  data.  Thresholding  masks  changes  that  remain  under  the 
threshold  value,  but  allows  changes  of  greater  magnitude  to  pass  through. 


Whitened  Ionization  Measurement 


Thresholded  Ionization  Measurement 


0  1000  2000  3000 

Time  since  event  initiation  (seconds) 


Figure  27.  Illustration  of  thresholding,  using  the  Location  10  ionization  sensor. 

To  the  left,  whitened  data  is  shown  in  purple  for  pre-stimulus 
observations  and  blue  for  test  observations,  plotted  against  the  time 
since  event  initiation.  The  upper  outlier  threshold  determined  from  pre¬ 
stimulus  data  is  the  black  line.  The  loess  smoother  is  shown  in  orange. 

The  truncated  loess  smoother — the  quantity  to  subtract  from  the 
whitened  data — is  the  red  dashed  line.  To  the  right,  the  thresholded 
data — the  difference  between  the  whitened  data  and  the  truncated  loess 
smoother — are  plotted  in  green  against  the  time  since  event  initiation. 


We  use  a  similar  process  to  transform  the  pre-stimulus  data  from  the  other  nine 
sensors.  Table  28  displays  the  mean,  AR  order,  AR  parameters,  threshold,  and  standard 
deviation  obtained  for  each  sensor.  By  transforming  each  sensor  individually,  we 
compensate  for  differences  across  sensors  of  the  same  type,  such  as  length  of  service  and 
manufacturing  variability. 
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Table  28.  Transformation  parameters  for  centering,  whitening,  thresholding, 
and  standardizing  third  deck  sensor  data  from  the  test  of  interest 
(smoldering  computer  monitor  in  engineering  storeroom). 
Parameters  are  based  on  551  pre-stimulus  observations  at  each 

location. 


Location  and 
Sensor 

Mean 

AR 

Order 

AR  Parameters 

Threshold 

SD 

10  Ionization 

0.201 

9 

0.9225 

-0.0758 

0.0568 

-0.0499 

0.0657 

-0.2158 

0.0057 

0.0336 

0.1134 

0.270 

0.107 

1 1  Ionization 

0.760 

2 

0.8610 

0.0690 

0.378 

0.139 

12  Ionization 

0.051 

1 

0.8665 

0.317 

0.122 

13  Ionization 

0.902 

8  ' 

0.9421 

0.0108 

-0.0170 

-0.0306 

-0.0032 

0.0940 

-0.0661 

-0.0102 

0.327 

0.124 

14  Ionization 

0.332 

1 

0.9498 

0.368 

0.132 

10  Photoelectric 

0.018 

1 

0.0898 

0.189 

0.060 

11  Photoelectric 

0.000 

1 

0.1348 

0.151 

0.051 

12  Photoelectric 

0.014 

1 

0.1107 

0.260 

0.087 

13  Photoelectric 

-0.037 

5  ' 

0.1837 

-0.0483 

-0.0285 

-0.0982 

-0.0104 

0.225 

0.063 

14  Photoelectric  -0.018  2  0.1885  -0.0673  0.057  0.056 


AR:  autoregressive  model 


3.  Achieving  Detection  in  Near  Real-Time 

Ideally,  we  would  analyze  the  data  sequentially.  A  fire  alarm  loses  its 
effectiveness  if  data  processing  time  causes  delay.  For  example,  the  computation  time 
required  to  obtain  a  minimum  half  factor  makes  online  processing  of  new  sensor 
observations  every  two  seconds  impractical.  NN  subgraphs  are  more  amenable  to  real¬ 
time  updating  (Li,  Yang,  &  Han,  2004).  In  order  to  provide  a  common  focus  for  applying 
our  methods,  we  limit  ourselves  to  batch  processing.  This  is  done  by  analyzing 
overlapping  windows  of  data  for  change  every  minute  (30  observations  per  update). 
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C.  RESULTS 

We  now  apply  the  change  detection  and  localization  methods  that  we  described  in 
Chapters  III  and  IV  to  the  EWFD  sensors.  Before  doing  so  we  discuss  several  factors  that 
affect  realism  in  the  handling  of  the  data. 

1.  Inclusion  of  Reference  Data 

A  decision  must  be  made  whether  to  apply  our  change  detection  methods  relative 
to  a  pre-stimulus  data  set,  or  to  analyze  observations  in  moving  windows  that  are 
potentially  subject  to  stimulus.  We  limit  our  window  size  (N)  due  to  the  computational 
cost  of  the  minimum  half  factor  and  the  desire  to  reduce  time  latency.  We  combine  a 
reference  set  of  pre-stimulus  observations  with  labels  1,2 with  real-time 

observations  l,2,...,n  ,  where  n0  +  n]  =  N .  We  include  pre-stimulus  observations  in 

hopes  of  detecting  more  subtle  changes,  such  as  a  slow  build-up  of  gases,  that  take  longer 
to  develop  than  the  length  of  our  window  size.  A  slow,  upward  drift  may  not  register  as  a 
change  when  only  analyzing  observations  after  the  drift  began.  To  compare  the  effects  of 
including  reference  data,  we  conduct  three  sets  of  analyses,  outlined  in  Table  29. 


Table  29.  Durations  of  pre-stimulus  observations  and  real-time  observations 
for  three  analysis  schemes.  Observations  are  taken  every  two 

seconds. 


Pre- stimulus 

Real  Time 

Total 

Observations 

Duration 

Observations 

Duration 

observations 

fa) 

(min:  sec) 

fa) 

(min:  sec) 

(N) 

(1) 

0 

0:00 

200 

6:40 

200 

(2) 

45 

1:30 

155 

5:10 

200 

(3) 

90 

3:00 

110 

3:40 

200 

We  conduct  analysis  in  windows  of  200  observations  (1)  using  no  reference  set  and  a 
moving  window  of  six  minutes  40  seconds  ( N  =  200  real-time  observations),  (2)  using  a 
fixed  reference  set  of  90  seconds  (no  =  45  pre-stimulus  observations)  and  a  moving 
window  of  five  minutes  10  seconds  (n\  =  155  real-time  observations),  and  (3)  using  a 
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fixed  reference  set  of  180  seconds  (no  =  90)  and  a  moving  window  of  three  minutes  40 
seconds  (n\  =  110). 

2.  Tests  from  Minimum  Subgraphs 

We  illustrate  change  detection  and  localization  using  both  the  minimum  half 
factor  and  the  /cNN  subgraphs  for  each  200-observation  analysis  window.  For  each 
method,  we  use  permutation  tests  to  obtain  p- values  for  testing  purposes  as  described  in 
Chapter  III.  In  order  for  a  change  to  be  detected  based  on  the  minimum  half  factor,  the 
minimum  half  factor  Sum  of  Pair-Differences  (SPD)  statistic  defined  in  (22)  with 
k=  100,  must  be  less  than  or  equal  to  the  fifth  percentile  of  10,000  permutations  of  the 
index  labels  for  an  or  =  0.05  level  of  test.  In  order  for  a  change  to  be  detected  based  on 
the  A'NN  subgraph  with  k  =  100,  one  of  two  criteria  of  a  composite  A'NN  test  must  be 
satisfied:  either  the  A'NN  sum  of  pair-differences  (22)  or  the  A'NN  prior  edge  count  (39) 
must  be  less  than  or  equal  to  the  2.5th  percentile  of  10,000  permutations  of  the  index 
labels.  We  use  a  critical  value  of  0.025  for  each  of  these  two  tests  to  control  the  Type  I 
error  of  the  composite  test  at  or  below  a=  0.05. 

In  addition  to  the  actual  change-detection  capabilities,  a  second  consideration  in 
choosing  a  matching  method  is  the  computational  burden.  As  noted  in  Chapter  III, 
obtaining  the  minimum  half  factor  is  best  done  with  proprietary  optimization  software 
such  as  CPLEX.  Nearest  neighbors  is  easily  implemented  in  R,  and  there  are  several  open 
source  A'NN  packages  available  for  R.  The  empirically  observed  worst-case  computation 
times  for  both  matching  methods  grow  on  the  order  of  N  '  ,  but  our  simulations  have 
shown  that  the  kNN  computation  times  with  k  =  N/2  are  approximately  40  times  faster 
than  minimum  half  factor  computation  times  for  the  same  sample  size. 

The  data  are  centered,  whitened  using  the  estimated  AR  model  parameters  shown 
in  Table  28,  and  thresholded  as  described  in  Section  B  of  this  chapter.  The  transformed 
data  are  then  used  to  obtain  the  minimum  subgraphs  (minimum  half  factor,  kNN)  that  we 
use  for  deriving  test  statistics  and  change  localization  estimates.  We  compute  the 
statistics  for  minimum  half  factor  and  kNN-based  tests,  apply  permutation  tests  to  obtain 
p-valucs,  and  signal  a  change  as  described  in  Chapter  III. 
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If  a  change  is  detected,  the  change  point  is  then  estimated  and  the  alarm  is 
triggered.  The  change-point  estimate  is  obtained  using  the  location  estimation  procedure 
introduced  in  Chapter  IV  Section  A,  based  on  the  standardized  prior  edge  counts.  On  our 
computing  platform,  detailed  in  Chapter  III  Section  B,  the  analyses  described  in  the 
immediately  preceding  paragraphs  take  at  most  10  seconds  for  each  new  window  when 
using  10,000  permutations,  making  near-sequential  analysis  feasible.  Our  computations 
were  completed  on  a  64-bit  Dell  OptiPlex  990  Windows  7  platform  with  an  Intel  Core  i5- 
2400  3.1  GHz  processor,  8  GB  RAM,  using  R  version  3.0.1  with  packages  cplexAPI 
(Gelius-Dietrich,  2013)  and  FNN  (Beygelzimer  et  al.,  2013)  and  using  CPLEX  version 
12.5.  We  stop  the  analysis  once  the  algorithm  detects  change.  We  conduct  a  total  of  thirty 
tests:  at  each  of  the  five  EWFD  locations  (10,  11,  12,  13,  14),  covering  the  three 
reference  window  sizes  (0,  90,  180  seconds),  and  the  two  minimum  subgraph  methods 
(minimum  half  factor,  fcNN). 

3.  Alarm  Times  and  Graphical  Results 

A  summary  of  the  alarm  times  determined  is  shown  in  Table  30.  Alarm  times  are 
discrete,  taking  values  from  55  seconds  to  2,395  seconds  in  increments  of  60  seconds. 
Ventilation  was  introduced  in  the  experiment  2,420  seconds  after  event  initiation.  Alarm 
times  from  the  NRL  analysis  are  shown  at  the  bottom  of  Table  30. 
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Table  30.  Alarm  times  (and  change-point  estimates),  in  seconds  after  event 
initiation,  by  ship  location  (see  Figure  19),  duration  of  pre-stimulus 
observations,  and  method  of  analysis.  Minimum  half  factor  refers 
to  forming  100  undirected  matches  for  each  of  the  200 
observations  in  the  analysis  window,  minimizing  the  total  distance. 


Reference 

Minimum  Half  Factor 

kNN 

Observations 

Alarm  Time  and 

Alarm  Time  and 

Location 

(seconds) 

Change  Estimate 

Change  Estimate 

10* 

none 

1,555  (1,465) 

1,495  (1,443) 

90 

1,555  (1,465) 

1,495  (1,443) 

180 

1,555  (1,465) 

1,495  (1,443) 

11* 

none 

1,015  (927) 

1,495  (1,423) 

90 

1,015  (927) 

1,495  (1,423) 

180 

1,315  (1,083) 

1,495  (1,423) 

12 

none 

none 

none 

90 

none 

none 

180 

none 

none 

13 

none 

1,735  (1,609) 

1,975  (1,843) 

90 

1,735  (1,609) 

1,935  (1,843) 

180 

1,675  (1,609) 

2,155  (2,091) 

14 

none 

none 

955  (645) 

90 

none 

none 

180 

none 

none 

In  the  analysis  of  Gottuk  et  al.  (2002),  based  on  four  sensors,  Location  10  alarmed  at 
2,037  seconds  and  Location  11  alarmed  at  2,103  seconds.  No  other  locations  alarmed. 


Locations  10  and  11,  which  are  closest  to  the  smoldering  computer  monitor, 
dominate  the  earliest  alarms.  Both  graph-theoretic  methods  produce  alarms,  although  the 
minimum  half  factor  SPD  test  does  not  produce  any  alarms  at  Location  14.  Location  14 
has  one  alarm  from  the  composite  kNN  test,  and  it  is  the  first  alarm.  Overall,  including 
pre-stimulus  observations  does  not  positively  affect  detection  in  this  scenario. 

The  next  two  figures  expand  on  the  results  for  Location  10  analyzed  with  180 
seconds  of  pre-stimulus  data  by  the  minimum  half  factor  SPD  test.  Figure  28  shows  the 
final  window  of  transformed  observation  from  Location  10.  Conducting  the  minimum 
half  factor  SPD  test  produces  an  alarm  1,555  seconds  after  event  initiation. 
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Time  since  event  initiation  (seconds) 

Figure  28.  Plot  of  Location  10  transformed  ionization  (top)  and  photoelectric 
(bottom)  sensor  observations  vs.  time  since  event  initiation.  The  chart 
includes  180  seconds  of  pre-stimulus  observations.  Change  is  signaled 
using  the  minimum  half  factor  sum  of  pair-differences  test.  The  red 
vertical  line  denotes  the  change  point  estimated  with  piecewise 
regression.  The  alarm  triggers  10  seconds  after  the  raw  observations  for 
this  window  are  collected,  at  1,555  seconds  after  event  initiation. 


Figure  29  shows  the  piecewise  regression  on  the  prior  edge  count  associated  with 
this  particular  alarm.  The  optimal  break  point  for  the  regression  is  1,465  seconds  after 
event  initiation.  This  change-point  estimate  is  taken  to  be  the  point  where  the  combustion 
effects  were  sufficient  to  justify  an  alarm.  Figures  corresponding  to  the  remaining  alarms 
of  Table  30  are  given  in  Appendix  D. 
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Time  since  event  initiation  (seconds) 


Figure  29.  Plot  of  piecewise  regression  conducted  on  the  standardized  count  of 
minimum  half  factor  prior  edges  with  indices  lower  than  the  index  point 
vs.  time  since  event  initiation.  The  estimated  change  point  is  1,465 
seconds  after  event  initiation.  The  chart  includes  180  seconds  of  pre¬ 
stimulus  observations.  Data  are  from  ionization  and  photoelectric 
sensors  at  Location  10. 


D.  DISCUSSION 

The  example  presented  in  this  chapter  demonstrates  a  real-world  application  of 

the  graph-theoretic  change  detection  and  localization  techniques  described  in  Chapters  III 

and  IV.  Although  these  methods  require  data  to  be  statistically  independent,  respond  to 

change  of  any  nature,  and  were  designed  for  off-line  change  detection,  we  adapt  them  for 

this  application  featuring  dependent  data,  suppression  of  false  alarms,  and  rapid 

detection.  The  presence  of  a  set  of  control  data  allows  the  fitting  of  a  time-series  model, 

based  only  on  current  and  past  observations,  that  removes  much  of  the  temporal 

dependence.  Low-level  stimuli  that  affect  the  sensors  but  do  not  justify  the  setting  of  an 

alarm  are  handled  by  filtering  with  a  scatterplot  smoother  that  is  truncated  at  a 

determined  threshold  value.  And,  although  we  process  the  data  off-line  in  batches,  we  do 

so  in  a  manner  that  near-real  time  detectability  may  be  possible.  While  the  NRL  analysis 
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is  predicated  on  training  data,  our  methods  have  no  such  requirement.  This  application 
also  provides  assessment  of  the  following: 

•  Inclusion  of  Control  Data 

We  find  that  the  decision  to  either  include  or  exclude  an  initial  run  of  pre-stimulus 
observations  does  not  make  an  appreciable  difference  in  the  example  we  consider.  Both 
minimum  half  factor  and  kNN  alarm  times  were  mostly  the  same  with  or  without  the 
reference  observations,  and  in  two  cases,  the  alarm  times  were  slower  when  including 
pre-stimulus  observations.  It  is  expected,  however,  that  situations  in  which  change  is 
gradual  may  benefit  from  the  use  of  control  data. 

•  Minimum  Subgraph  Comparison 

The  minimum  half  factor  method  edges  the  kNN  method  in  terms  of  fastest  time 
to  alarm.  Of  the  15  comparisons  between  the  minimum  half  factor  SPD  test  and 
composite  kNN  test  shown  in  Table  30,  the  minimum  half  factor  signals  the  first  alarm 
six  times,  while  the  composite  kNN  test  supplies  the  first  alarm  four  times. 

•  Threshold  Exploration 

In  Section  B  of  this  chapter,  we  explain  the  rationale  and  process  of  thresholding 
for  removal  of  weak  signals  that  do  not  justify  alarms.  The  purpose  of  setting  the 
threshold  is  to  appropriately  balance  the  nuisance  of  false  alarms  with  the  risk  of  missed 
detection.  Figure  30  shows  median  alarm  times  obtained  with  different  threshold  values 
(the  second  horizontal  axis  value  is  the  threshold  value  of  the  illustration).  Increasing  the 
threshold  has  the  expected  effect  of  increasing  the  time  to  alarm  for  the  minimum  half 
factor  SPD  test,  while  the  median  alarm  time  of  the  composite  kNN  test  does  not  change 
once  the  lowest  threshold  is  invoked. 
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Number  of  IQR  deviations  above  the  third  quartile 

Figure  30.  Effects  of  chosen  threshold  on  the  median  alarm  time  for  each 
method  (minimum  half  factor  and  ANN),  taking  the  median  across  the 
fifteen  tests  of  each  method. 
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VI.  CONCLUSIONS 


A.  SUMMARY  OF  MAIN  FINDINGS 

The  need  for  statistical  techniques  that  detect  and  localize  distributional  change  in 
a  sequence  of  multivariate  observations  continues  to  grow  in  modern  applications.  Graph- 
theoretic  techniques  are  important  members  of  this  class  by  virtue  of  their  ability  to  detect 
change  of  a  general  nature  without  the  imposition  of  strong  assumptions  about  the 
sampling  distribution  of  the  data.  The  methods  proposed  in  this  dissertation  compare  well 
with  the  best  parametric  procedures  when  assumptions  needed  to  justify  the  latter  are 
satisfied.  Once  change  has  been  detected,  we  provide  a  technique  for  using  information 
from  minimum  subgraphs  to  estimate  the  change  point  with  reasonable  accuracy.  We  also 
present  a  real-world  example  in  which  apparent  limitations  to  the  application  of  our 
methods  are  surmounted. 

B.  DIRECTIONS  FOR  FUTURE  RESEARCH 

There  are  opportunities  for  future  research  in  the  areas  of  subgraph  formation  and 
usage,  change-point  localization,  dimension  reduction,  and  change-type  diagnosis. 

1.  Subgraph  Formation  and  Usage 

Minimum  subgraphs  and  test  statistics  derived  from  them  that  are  different  from 
those  contained  in  this  dissertation  provide  fertile  ground  for  exploration.  Choosing  a 
different  configuration  of  edge  score  function  and  degree  of  matches  may  provide 
increased  detection  power.  Unlike  the  minimum  half  factor,  which  must  be  completely 
reformed  for  each  new  observation,  the  kNN  subgraph  easily  admits  a  new  observation 
with  less  change  in  the  graphical  structure  (Li,  Yang,  &  Han,  2004).  Statistics  derived 
from  the  kNN  subgraph  can  feasibly  conduct  sequential  change  detection  and 
localization,  which  we  do  not  explore  in  this  dissertation. 

2.  Change-Point  Localization 

Future  research  of  interest  would  adapt  change-point  localization  using  minimum 

subgraphs  to  situations  other  than  a  single  abrupt  change,  such  as  localizing  the  initiation 
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of  gradual  change  or  allowing  for  multiple  change  points.  Such  an  adaptation  may  also 
allow  for  diagnosing  non-stationarity  (i.e.,  change  initiated  at  the  beginning  of  the 
observation  sequence)  by  returning  “no  change  point.”  This  would  be  the  diagnosis  for  an 
instrument  undergoing  continuous  drift  from  the  onset  of  data  collection. 

3.  Dimension  Reduction  and  Localization 

The  techniques  proposed  in  this  dissertation  localize  the  change-point  in  terms  of 
the  observation  sequence,  but  do  not  localize  the  variables  responsible  for  the  detected 
distributional  change.  Can  graph-theoretic  methods  help  to  localize  the  change  to  the 
changed  dimension  or  dimensions,  and/or  reduce  the  number  of  dimensions  to  only  those 
of  interest?  In  certain  applications  such  as  image  analysis  or  wide-area  monitoring, 
contiguity  constraints  may  allow  for  reducing  the  search  space.  In  the  manner  of  Shirabe 
(2009),  a  contiguity  constraint  set  restricts  change  by  rejecting  the  null  hypothesis  only  if 
the  change  detected  is  physically  sensible.  Such  a  constraint  set  defines  dimensional 
adjacencies  and  requires  there  to  be  a  path  between  the  dimensions  triggering  change,  as 
in  a  network  flow  problem.  Contiguity  requirements  restrict  the  dimensional  choices, 
making  the  problem  more  manageable. 

4.  Change-Type  Diagnosis 

Through  use  of  thresholding  applied  to  fire  detection  sensor  data,  we  adapted  our 
method  to  detect  only  certain  types  of  changes.  Other  applications  may  also  benefit  from 
knowledge  of  the  type  of  change  detected  (location,  scale,  higher  moment).  Information 
from  the  minimum  k-factor  and  kNN  subgraphs  may  be  able  to  provide  diagnostic 
information  regarding  the  likely  nature  of  the  detected  change. 
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APPENDIX  A.  THE  CURSE  OF  DIMENSIONALITY 


There  are  many  meanings  that  researchers  bestow  on  the  phrase  the  “curse  of 
dimensionality,”  but  its  published  origins  go  back  to  the  early  days  of  dynamic 
programming.  Prior  to  penning  that  exact  language,  RAND  Corporation  researcher 
Richard  Bellman  favored  more  nautical  turns  of  phrase,  which  we  are  obligated  to  point 
out  as  we  find  ourselves  at  the  Naval  Postgraduate  School.  Bellman  (1956)  dismally 
notes  that  “this  method  founders  on  the  reef  of  dimensionality”  (p.  767).  Bellman  (1957) 
catastrophically  declares,  “...we  founder  on  the  shoals  of  dimensionality”  (p.  272).  A  few 
years  later,  no  doubt  after  someone  reminded  him  that  he  worked  at  a  corporation  funded 
by  the  U.S.  Air  Force,  Bellman  (1961)  gamely  laments  of  “...the  curse  of  dimensionality, 
a  malediction  that  has  plagued  the  scientist  from  earliest  days”  (p.  94),  although  he  fails 
to  provide  a  reference. 
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APPENDIX  B.  ADDITIONAL  ENSEMBLE  TRACES 


We  present  figures  similar  to  Figure  11  in  Chapter  III  Section  C,  for  the 
multivariate  normal  null  hypothesis  (Figure  31)  and  mean  drift  and  scale  jump 
(Figure  32)  cases.  The  figures  are  based  on  100  simulations  of  iV  =  200  observations  from 
a  five-dimensional  multivariate  normal  distribution.  The  thick  black  line  is  the  critical 
value.  Red  traces  indicate  the  ESPM  statistic  exceeding  the  critical  value  based  on 
a  =  0.05  level  of  test.  Gray  traces  do  not  exceed  the  critical  value. 


0  20  40  60  80  100 

Number  of  Ensemble  Matches 


Figure  31.  The  Ensemble  Sum  of  Pair-Maxima  test  statistic  for  observations 
under  the  null  hypothesis  of  no  change,  for  matches  from  1  to  N/2. 


103 


ESPM 


0  20  40  60  80  100  0  20  40  60  80  100 

Number  of  Ensemble  Matches  Number  of  Ensemble  Matches 


Figure  32.  The  Ensemble  Sum  of  Pair-Maxima  test  statistic  for  observations 
with  a  0.5  magnitude  change  beginning  at  observation  r  =  101 ,  for 
matches  from  1  to  N/2.  To  the  left  is  the  case  of  drift  change  of 
magnitude  A  in  the  first  component  of  the  mean  vector;  to  the  right  is 
the  case  of  scale  jump  in  all  five  dimensions. 


Considering  only  the  values  at  k  =  100,  detection  power  in  the  mean  drift  case 
falls  from  0.85  to  0.76.  Detection  power  in  the  scale  jump  case  falls  from  0.98  to  0.81. 
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APPENDIX  C.  PRIOR  EDGE  TABLES 


For  the  standard  bivariate  normal  sample  of  Chapter  II  Section  A,  Table  31  shows 
the  10  minimum  half  factor  matches  of  each  vertex  as  well  as  the  prior  edge  count,  its 
expected  value,  standard  deviation,  and  standardized  value.  Table  32  shows  the  incoming 
nearest  neighbor  edge  source  vertices  for  each  vertex  as  well  as  the  prior  incoming  edge 
count,  its  expected  value,  standard  deviation,  and  standardized  value. 


Table  31.  Minimum  half  factor  matches,  prior  edge  count  (/,),  expected 
value,  standard  deviation,  and  standardized  value  for  20 
observations  drawn  from  a  standard  bivariate  normal  distribution. 


Vertex  i 

Vertices  Matched  to 

i 

h 

V 

a 

Zi 

1 

3 

4 

5 

6 

7 

11 

15 

17 

18 

20 

0 

0.00 

0.00 

— 

2 

3 

5 

10 

11 

13 

14 

15 

16 

18 

19 

0 

0.53 

0.50 

-1.05 

3 

1 

2 

6 

7 

8 

9 

11 

15 

18 

20 

2 

1.05 

0.69 

1.38 

4 

1 

5 

6 

10 

12 

14 

15 

16 

17 

18 

1 

1.58 

0.82 

-0.71 

5 

1 

2 

4 

6 

7 

10 

12 

15 

17 

18 

3 

2.11 

0.91 

0.98 

6 

1 

3 

4 

5 

7 

11 

15 

17 

18 

20 

4 

2.63 

0.98 

1.39 

7 

1 

3 

5 

6 

8 

9 

11 

15 

18 

20 

4 

3.16 

1.04 

0.81 

8 

3 

7 

9 

11 

12 

13 

14 

16 

19 

20 

2 

3.68 

1.08 

-1.56 

9 

3 

7 

8 

11 

12 

13 

14 

16 

19 

20 

3 

4.21 

1.10 

-1.10 

10 

2 

4 

5 

12 

13 

14 

15 

16 

17 

19 

3 

4.74 

1.12 

-1.56 

11 

1 

2 

3 

6 

7 

8 

9 

13 

19 

20 

7 

5.26 

1.12 

1.56 

12 

4 

5 

8 

9 

10 

13 

14 

16 

17 

19 

5 

5.79 

1.10 

-0.72 

13 

2 

8 

9 

10 

11 

12 

14 

16 

19 

20 

6 

6.32 

1.08 

-0.29 

14 

2 

4 

8 

9 

10 

12 

13 

16 

17 

19 

7 

6.84 

1.04 

0.15 

15 

1 

2 

3 

4 

5 

6 

7 

10 

17 

18 

8 

7.37 

0.98 

0.64 

16 

2 

4 

8 

9 

10 

12 

13 

14 

17 

19 

8 

7.89 

0.91 

0.12 

17 

1 

4 

5 

6 

10 

12 

14 

15 

16 

18 

9 

8.42 

0.82 

0.71 

18 

1 

2 

3 

4 

5 

6 

7 

15 

17 

20 

9 

8.95 

0.69 

0.08 

19 

2 

8 

9 

10 

11 

12 

13 

14 

16 

20 

9 

9.47 

0.50 

-0.95 

20 

1 

3 

6 

7 

8 

9 

11 

13 

18 

19 

10 

10.00 

0.00 

— 
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Table  32.  Vertex  sources  for  incoming  10-nearest  neighbors  edges,  prior  incoming  edge  count  (/,),  expected  value, 
standard  deviation,  and  standardized  value  for  20  observations  drawn  from  a  standard  bivariate  normal 

distribution. 


Vertex  i 

Vertices  Sending  Edges  to  i 

h 

H 

a 

Zi 

1 

3 

5 

6 

7 

11 

15 

17 

18 

20 

0 

0 

0 

0 

2 

1 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

1 

1 

0 

0 

3 

1 

2 

5 

6 

7 

8 

9 

10 

11 

14 

15 

18 

19 

20 

2 

1.47 

0.61 

0.87 

4 

1 

2 

3 

5 

6 

10 

12 

13 

14 

15 

16 

17 

18 

19 

3 

2.21 

0.72 

1.1 

5 

1 

4 

6 

17 

18 

2 

1.05 

0.8 

1.18 

6 

1 

5 

7 

15 

17 

18 

2 

1.58 

0.92 

0.46 

7 

11 

20 

0 

0.63 

0.64 

-0.99 

8 

2 

3 

7 

9 

10 

11 

12 

13 

14 

15 

16 

18 

19 

20 

3 

5.16 

0.95 

-2.27 

9 

3 

7 

8 

11 

12 

13 

16 

19 

20 

3 

3.79 

1.1 

-0.72 

10 

1 

2 

3 

4 

5 

6 

8 

9 

11 

12 

13 

14 

15 

16 

17 

18 

19 

8 

8.05 

0.69 

-0.08 

11 

1 

3 

6 

7 

8 

9 

20 

6 

3.68 

1.08 

2.15 

12 

13 

14 

16 

0 

1.74 

0.81 

-2.15 

13 

2 

4 

8 

9 

10 

12 

14 

16 

19 

20 

6 

6.32 

1.08 

-0.29 

14 

2 

4 

5 

8 

9 

10 

12 

13 

16 

17 

19 

8 

7.53 

1.03 

0.46 

15 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

16 

17 

18 

19 

20 

14 

14 

0 

0 

16 

2 

4 

8 

10 

12 

13 

14 

17 

19 

7 

7.11 

0.91 

-0.12 

17 

1 

4 

5 

6 

15 

18 

5 

5.05 

0.76 

-0.07 

18 

1 

2 

3 

4 

5 

6 

7 

10 

15 

17 

10 

8.95 

0.69 

1.53 

19 

2 

3 

4 

8 

9 

10 

11 

12 

13 

14 

15 

16 

20 

12 

12.32 

0.46 

-0.68 

20 

7 

9 

11 

3 

3 

0 

0 
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For  the  change  case,  Table  33  shows  the  10  minimum  half  factor  matches  of  each 
vertex  as  well  as  the  prior  edge  count,  its  expected  value,  standard  deviation,  and 
standardized  value.  Table  34  shows  the  incoming  nearest  neighbor  edge  source  vertices 
for  each  vertex  as  well  as  the  prior  incoming  edge  count,  its  expected  value,  standard 
deviation,  and  standardized  value. 


Table  33.  Minimum  half  factor  matches,  prior  edge  count  (/,),  expected 
value,  standard  deviation,  and  standardized  value  for  20 
observations  drawn  from  a  bivariate  normal  distribution,  with  an 
abrupt  change  in  the  mean  vector  from  (0,  0)  to  (3,  0)  starting  at 

observation  11. 


Vertex  i 

Vertices  Matched  to  i 

h 

M 

a 

Zi 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

16 

0 

0 

0 

— 

2 

1 

3 

4 

5 

6 

7 

8 

9 

10 

13 

1 

0.53 

0.50 

0.95 

3 

1 

2 

4 

5 

6 

7 

8 

10 

18 

20 

2 

1.05 

0.69 

1.38 

4 

1 

2 

3 

5 

6 

7 

8 

10 

11 

19 

3 

1.58 

0.82 

1.74 

5 

1 

2 

3 

4 

6 

7 

8 

9 

10 

13 

4 

2.11 

0.91 

2.08 

6 

1 

2 

3 

4 

5 

7 

8 

10 

19 

20 

5 

2.63 

0.98 

2.41 

7 

1 

2 

3 

4 

5 

6 

8 

9 

10 

16 

6 

3.16 

1.04 

2.73 

8 

1 

2 

3 

4 

5 

6 

7 

9 

10 

16 

7 

3.68 

1.08 

3.07 

9 

1 

2 

5 

7 

8 

10 

12 

14 

15 

17 

5 

4.21 

1.10 

0.72 

10 

1 

2 

3 

4 

5 

6 

7 

8 

9 

16 

9 

4.74 

1.12 

3.82 

11 

4 

12 

13 

14 

15 

16 

17 

18 

19 

20 

1 

5.26 

1.12 

-3.82 

12 

9 

11 

13 

14 

15 

16 

17 

18 

19 

20 

2 

5.79 

1.10 

-3.43 

13 

2 

5 

11 

12 

14 

15 

17 

18 

19 

20 

4 

6.32 

1.08 

-2.15 

14 

9 

11 

12 

13 

15 

16 

17 

18 

19 

20 

4 

6.84 

1.04 

-2.73 

15 

9 

11 

12 

13 

14 

16 

17 

18 

19 

20 

5 

7.37 

0.98 

-2.41 

16 

1 

7 

8 

10 

11 

12 

14 

15 

17 

18 

8 

7.89 

0.91 

0.12 

17 

9 

11 

12 

13 

14 

15 

16 

18 

19 

20 

7 

8.42 

0.82 

-1.74 

18 

3 

11 

12 

13 

14 

15 

16 

17 

19 

20 

8 

8.95 

0.69 

-1.38 

19 

4 

6 

11 

12 

13 

14 

15 

17 

18 

20 

9 

9.47 

0.50 

-0.95 

20 

3 

6 

11 

12 

13 

14 

15 

17 

18 

19 

10 

10 

0 

— 

107 


Table  34.  Vertex  sources  for  incoming  10-nearest  neighbors  edges,  prior  incoming  edge  count  (/,),  expected  value, 

standard  deviation,  and  standardized  value  for  20  observations  drawn  from  a  bivariate  normal  distribution,  with 
an  abrupt  change  in  the  mean  vector  from  (0,  0)  to  (3,  0)  starting  at  observation  11. 


Vertex  i 

Vertices  Sending  Edges  to  i 

k 

V 

a 

Zi 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

16 

19 

0 

0.00 

0.00 

— 

2 

1 

3 

4 

5 

6 

7 

8 

9 

10 

13 

16 

19 

20 

1 

0.68 

0.46 

0.68 

3 

1 

2 

4 

5 

6 

7 

10 

2 

0.74 

0.66 

1.91 

4 

1 

2 

3 

5 

6 

7 

8 

10 

3 

1.26 

0.81 

2.15 

5 

1 

2 

3 

4 

6 

7 

8 

9 

10 

16 

19 

4 

2.32 

0.9 

1.87 

6 

3 

4 

7 

10 

2 

1.05 

0.8 

1.18 

7 

1 

2 

3 

4 

5 

6 

8 

9 

10 

6 

2.84 

1.04 

3.04 

8 

1 

2 

3 

4 

5 

6 

7 

9 

10 

11 

13 

14 

15 

16 

17 

18 

19 

20 

7 

6.63 

0.48 

0.76 

9 

1 

2 

3 

4 

5 

6 

7 

8 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

8 

8.00 

0.00 

- 

10 

1 

2 

3 

4 

5 

6 

7 

8 

8 

3.79 

1.1 

3.81 

11 

9 

12 

13 

14 

15 

16 

17 

18 

19 

20 

1 

5.26 

1.12 

-3.82 

12 

11 

13 

14 

15 

17 

18 

19 

20 

1 

4.63 

1.09 

-3.33 

13 

8 

9 

11 

12 

14 

15 

16 

17 

18 

19 

20 

4 

6.95 

1.07 

-2.76 

14 

11 

12 

13 

15 

16 

17 

18 

20 

3 

5.47 

1.03 

-2.41 

15 

11 

12 

13 

14 

17 

18 

20 

4 

5.16 

0.95 

-1.22 

16 

1 

2 

5 

8 

9 

11 

12 

13 

14 

15 

17 

18 

19 

10 

10.26 

0.85 

-0.31 

17 

12 

14 

15 

3 

2.53 

0.6 

0.8 

18 

9 

11 

12 

13 

14 

15 

16 

17 

19 

20 

8 

8.95 

0.69 

-1.38 

19 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

20 

18 

18.00 

0.00 

- 

20 

11 

12 

18 

3 

3.00 

0.00 

- 
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APPENDIX  D.  SUPPLEMENTARY  APPLICATION  FIGURES 


This  appendix  features  figures  to  supplement  Chapter  V  Sections  A  and  B. 

A.  CHALLENGING  DATA 

As  discussed  in  Chapter  V  Section  A,  the  carbon  monoxide  and  carbon  dioxide 
sensor  data  from  this  collection  were  not  practically  useable.  The  carbon  monoxide  data 
shown  on  the  left  in  Figure  33  show  spiking  in  the  absence  of  outside  stimuli,  while  the 
carbon  dioxide  data  shown  on  the  right  in  Figure  33  show  non-stationarity. 

The  carbon  monoxide  sensor  spiking  is  attributed  to  radio  frequency  interference 
(RFI).  The  specific  effects  of  RFI  on  carbon  monoxide  sensors  are  not  well  documented 
in  literature,  but  residential  carbon  monoxide  alarms  are  known  to  respond  to  amateur 
radio  transmissions  (Low,  2007).  We  are  not  relying  on  post-processing  of  the  data,  nor 
do  we  consider  methods  for  filtering  these  RFI  spikes.  The  spiking  seems  to  affect 
several  minutes  of  data.  We  could  develop  a  scheme  to  reject  spiking  data  although, 
without  direct  access  to  these  sensors,  it  is  difficult  to  determine  when  such  a  scheme 
should  consider  the  spike  complete.  Ultimately,  we  choose  to  not  use  the  carbon 
monoxide  data  due  to  the  RFI  spikes. 

The  carbon  dioxide  standardized  quantity  shows  an  unexplained  increase  in  value 
prior  to  event  initiation.  Each  sensor  appears  to  wander  in  a  different  way,  increasing  or 
decreasing  independently  of  the  other  sensors  in  the  absence  of  any  test  stimuli.  We  do 
not  know  the  reason  for  this  non- stationary  behavior.  Our  methods  are  not  equipped  to 
handle  such  non-stationarity,  so  we  choose  to  remove  the  carbon  dioxide  sensor  from 
inclusion  in  further  analysis.  In  our  demonstration,  we  focus  on  data  collected  from  the 
ionization  and  photoelectric  sensors. 
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Figure  33.  Centered  sensor  pre-stimulus  data  for  the  five  third  deck  EWFD 
locations.  To  the  left,  carbon  monoxide  measurements,  and  to  the  right, 
carbon  dioxide  measurements.  Each  vertical  axis  is  scaled  to  match  the 
range  of  the  centered  sensor  response. 


B.  GRAPHICAL  RESULTS 

Figures  34  through  43  convey  additional  results  of  our  trial  analogous  to  Figures 
28  and  29  of  Chapter  V  Section  C.  All  figures  show  the  200-observation  window  in 
which  change  is  detected.  Figures  34  through  42  show  Focations  10,  11,  and  13,  with 
minimum  half  factor  results  to  the  left  and  results  from  k-nearest  neighbors  to  the  right. 
The  minimum  half  factor  sum  of  pair-differences  detected  no  changes  at  Focation  14;  we 
show  nearest  neighbor  results  in  Figure  43.  In  all  figures,  the  upper  plots  show  the 
standardized  prior  edge  count  and  the  fitted  piecewise  regression  line  for  the  window  in 
which  a  change  is  detected.  The  middle  plots  show  the  transformed  ionization  sensor 
data;  the  lower  plots  show  the  transformed  photoelectric  sensor  data.  The  red  vertical 
lines  denote  the  change  points  estimated  with  piecewise  regression  for  the  minimum 
subgraph.  The  alarms  trigger  10  seconds  after  the  raw  observations  for  these  windows  are 
collected. 
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Figure  34.  Plots  for  Location  10  with  no  pre-stimulus  observations.  The 
estimated  change  points  are  1,465  and  1,443  seconds  after  event 
initiation;  the  alarms  trigger  1,555  and  1,495  seconds  after  event 

initiation. 
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Figure  35.  Plots  for  Location  10  with  90  seconds  of  pre-stimulus  observations. 

The  estimated  change  points  are  1,465  and  1,443  seconds  after  event 
initiation;  the  alarms  trigger  1,555  and  1,495  seconds  after  event 

initiation. 
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Figure  36.  Plots  for  Location  10  with  180  seconds  of  pre-stimulus  observations. 

The  estimated  change  points  are  1,465  and  1,443  seconds  after  event 
initiation;  the  alarms  trigger  1,555  and  1,495  seconds  after  event 

initiation. 


Figure  37.  Plots  for  Location  1 1  with  no  pre-stimulus  observations.  The 
estimated  change  points  are  927  and  1,423  seconds  after  event 
initiation;  the  alarms  trigger  1,015  and  1,495  seconds  after  event 

initiation. 
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Figure  38.  Plots  for  Location  1 1  with  90  seconds  of  pre-stimulus  observations. 

The  estimated  change  points  are  927  and  1,423  seconds  after  event 
initiation;  the  alarms  trigger  1,015  and  1,495  seconds  after  event 

initiation. 
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Figure  39.  Plots  for  Location  11  with  180  seconds  of  pre-stimulus  observations. 

The  estimated  change  points  are  1,083  and  1,423  seconds  after  event 
initiation;  the  alarms  trigger  1,315  and  1,495  seconds  after  event 

initiation. 
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Figure  40.  Plots  for  Location  13  with  no  pre-stimulus  observations.  The 
estimated  change  points  are  1,609  and  1,843  seconds  after  event 
initiation;  the  alarms  trigger  1,735  and  1,935  seconds  after  event 

initiation. 
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Figure  41.  Plots  for  Location  13  with  90  seconds  of  pre-stimulus  observations. 

The  estimated  change  points  are  1,609  and  1,843  seconds  after  event 
initiation;  the  alarms  trigger  1,735  and  1,935  seconds  after  event 

initiation. 
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Figure  42.  Plots  for  Location  13  with  180  seconds  of  pre-stimulus  observations. 

The  estimated  change  points  are  1,609  and  2,091  seconds  after  event 
initiation;  the  alarms  trigger  1,675  and  2,155  seconds  after  event 

initiation. 
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Figure  43.  Plots  for  Location  14  with  no  pre-stimulus  observations.  The 

estimated  change  point  is  645  seconds  after  event  initiation;  the  alarm 
triggers  955  seconds  after  event  initiation. 


Figures  44  through  48,  one  for  each  location,  show  the  transformed  data  across 


the  entire  duration  of  the  test.  The  graph-theoretic  change-point  estimates  and  alarms 
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based  on  the  graph-theoretic  methods  are  depicted  as  vertical  lines  in  red  and  heavy 
black,  respectively.  The  upper  and  lower  plots  show  the  transformed  ionization  and 
photoelectric  senor  data,  respectively.  Each  vertical  axis  is  scaled  to  capture  the  range  of 
the  transformed  data. 


Figure  44.  Location  10  transformed  measurements,  change-point  estimates 
(1,443  and  1,465  seconds  after  event  initiation)  and  alarms  (1,495  and 
1,555  seconds  after  event  initiation). 
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Figure  45.  Location  1 1  transformed  measurements,  change-point  estimates 

(927,  1,083,  and  1,423  seconds  after  event  initiation)  and  alarms  (1,015, 
1,315,  and  1,495  seconds  after  event  initiation). 
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Figure  46.  Location  12  transformed  measurements.  No  alarms  are  triggered. 
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Figure  47.  Location  13  transformed  measurements,  change-point  estimates 
(1,609,  1,843,  and  2,091  seconds  after  event  initiation)  and  alarms 
(1,675,  1,735,  19,35,  1,975,  and  2,155  seconds  after  event  initiation). 
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Figure  48.  Location  14  transformed  measurements,  change-point  estimate  (645 
seconds  after  event  initiation)  and  alarm  (955  seconds  after  event 

initiation). 
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